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Abstract. We give a overview of stochastic models of evolution that have found 
applications in genetics, ecology and linguistics for an audience of nonspecialists, 
especially statistical physicists. In particular, we focus mostly on neutral models in 
which no intrinsic advantage is ascribed to a particular type of the variable unit, for 
example a gene, appearing in the theory. In many cases these models are exactly 
solvable and furthermore go some way to describing observed features of genetic, 
ecological and linguistic systems. 
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1. Introduction 

The distinguishing feature of statistical mechanics as a theory of many-body physics 
is that it has probabilistic notions at its core. At equilibrium, the principle of equal 
a priori probabilities is invoked as a means to identify the typical behaviour of a 
system as a whole. Out of equilibrium, the evolution of a system is usually couched 
in terms of stochastic dynamics, as a glance at many articles in this journal will 
confirm. The procedure of stochastic modelling is not, of course, restricted to physics. In 
particular, stochastic models have played a pivotal role in understanding the dynamics 
of evolutionary systems and as such one sees many similarities in the approaches and 
methods that have been used to those employed by statistical physicists. The most well- 
established discipline for which this is the case is population genetics, i.e., the study of 
gene frequencies among a population of reproducing individuals. More recently, models 
very similar to those that arise in the population genetics arena have also been applied 
in ecology (where the frequencies of particular species in an ecosystem are of interest) 
and language change (where now the focus is on frequencies of linguistic variants used 
by a community of speakers). In this article, it is our aim to introduce key concepts 
and results from these fields to practitioners in statistical mechanics in order to further 
cross-fertilisation between them. 

Mostly, we will concentrate on the application to population genetics: due to this 
discipline's age, most of the mathematical literature describing the stochastic models of 
interest are couched in terms of the language of population genetics. We shall therefore 
begin by introducing the relevant terminology. In the simplified mathematical models 
we shall be describing, a gene is taken to mean a variable unit that codes for a specific 
trait (such as eye colour), and different variants of the gene (coding in this example 
for different eye colours) are called alleles. Changes in allele frequency can occur as a 
consequence of three processes. First, there is reproduction, in which offspring organisms 
acquire copies of alleles from their parents. Secondly, in the copying process, random 
mutations can occur which may change one allele to another, or create a completely new 
allele. Finally, for whatever reason, organisms carrying one allele may end up having 
more offspring than another: this is selection. 

The simplest stochastic model of an evolving population dates from the 1930's and 
was introduced independently by Ronald Fisher [1] and Sewall Wright [2J. In this model, 
the number of organisms is taken to be constant from one generation to the next, and 
each instance of a gene in one generation is an exact copy of one randomly chosen with 
replacement from the previous generation. Such a process could be realised in a number 
of ways. Let us consider first of all haploid organisms which carry only one instance of 
each gene and (possibly) reproduce asexually. A new generation could then be formed 
by each of the organisms producing an infinite number of gametes (e.g., seeds) before 
dying; then, a random fixed-size sample of these gametes survive to become offspring 
organisms. More often, geneticists focus on diploid organisms which carry two copies 
of each gene and reproduce sexually. Now, each organism produces two infinite sets of 
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gametes, one for each allele carried by the parent. Then, after dying, the next generation 
is formed by combining the requisite number of randomly-chosen gamete pairs. Either 
way, the mean number of offspring that any individual has in the following generation 
is one, and so there is no selection: this is a neutral model. Furthermore the fact that 
the allele instances present in one generation are a random sample of those present in 
the previous generation means that allele frequencies fluctuate through sampling effects 
alone. These fluctuations are called genetic drift and — in the absence of any mutations — 
over time cause permanent extinction of one or more alleles: The steady state of such a 
model is one in which only a single allele remains: this allele is then said to have fixed 
in the population. 

A haploid, or randomly-mating diploid population evolving in this way is often 
referred to as an ideal population. In reality, individuals do not mate at random, 
and there is often a preference, or requirement, for mating to occur between or within 
different classes of individuals. Most obviously, in sexually reproducing organisms, 
females can mate only with males; other factors, such as individuals' geographical 
locations and age (when one has overlapping generations) may also lead to deviations 
from non-ideal behaviour. Nevertheless, in some cases it is found that the predictions 
of the ideal model are relevant if one replaces the actual size of the population with an 
effective size, which can be thought of as a number of breeding individuals. We will 
discuss this procedure in more detail later in this article. 

Neutral theories in population genetics (as well as in the ecological and linguistic 
applications — see below) seem to have a far greater realm of validity that might naively 
be expected. The idea that many genetic mutations are effectively neutral, that is do not 
significantly affect the fitness of the carrier, was pioneered by Motoo Kimura 0, H] • Thus 
changes in their frequency are due to chance, rather than selection. This is not to deny 
that natural selection has an important role in the development of morphological and 
behavioural characteristics, only that random genetic drift has a more significant effect 
that had hitherto been envisaged. The relationship of the neutral theory to selection 
was explored further with the introduction of the concept of "near-neutrality" [5], Ej, in 
which the extent to which genes are affected by drift and selection is a function of the 
effective size of a breeding population. 

In ecology the concept of neutrality is more recent, and is frequently associated 
with Stephen Hubbell, who has developed and popularised the concept during the last 
decade [7j. The idea of neutrality in ecology can be explained by imagining a set of 
similar species in a local community, trees in a forest, for example. When a particular 
tree dies, it is replaced by an "offspring" of one of the other trees in the forest — picked at 
random. This is similar to the Wright-Fisher model, except that now individuals (trees) 
are the fundamental entities, not genes, and they come in different species, rather than 
as different alleles. As in the genetic case, successive random events will eventually 
lead to the extinction of one species by chance, and then another, until eventually only 
one will remain. That this does not happen is postulated to be due to new individuals 
occasionally being introduced from outside the local community by immigration. The 



Stochastic Models of Evolution in Genetics, Ecology and Linguistics 



4 



distribution of species abundances calculated from neutral theory gives a remarkably 
good fit to data [8], however the extreme simplicity of the model has made it very 
controversial. Some ecologists see it as a good "zeroth-order approximation" on which 
to build a more complete description [9], while others believe some other kind of starting 
point is required pm E] • 

The analogies to a model of language change [121 DSl HS1 EE] are a little more 
abstract. Here, there is a retained memory of usages of a particular variant of a 
linguistic variable (such as a vowel sound or grammatical structure); when a variant is 
reproduced, its retention in a speaker's memory displaces a record of an earlier utterance. 
Thus a speaker's perception of the frequency with which a particular variant is used— 
which in the quantitative version of the model which has been developed [17], is taken 
as the frequency with which that same speaker produces that variant — also exhibits 
genetic-drift-like fluctuations over time. The imitation of linguistic variables according 
to usage frequencies experienced by individual speakers falls within a paradigm referred 
to as the usage-based model in linguistics [H] (this to it distinguish from theories in 
which aspects of the grammar come "pre-programmed"). Investigations of the model of 
language change introduced in jTTj have only just begun, but it is already clear that the 
structure of the network of speakers, shaped by both geography and social interactions, 
is important in determining the time needed for a particular variant to become fixed. 
In some cases, such as in new dialect formation [T9|, ED], fixation of linguistic variants 
has been seen to occur on the timescale of a human generation: it is apparently possible 
that this may occur entirely through random effects, without the need to invoke any 
selection mechanism [2~T] . 

In the next section of this article, we present a precise definition of the model 
of neutral genetic drift in an ideal population, along with the non-ideal generalisation 
of a subdivided populations. We also demonstrate concretely the relationship to the 
ecological and linguistic models just described. Then, in Sections [3] and H] we describe 
two complementary analytical approaches. The first starts with a master equation and 
leads to a diffusion approximation for allele frequencies; this type of approximation 
was popular in the 1950s, largely through Kimura's efforts. Although implicitly used 
previously, the 1980s saw the formalisation of an approach called the coalescent, in which 
the statistical properties of the ancestry of a present-day population yield additional 
insights into the population dynamics. Finally, in Sections [5] and [6] we review some of 
the special features of non-ideal and non-neutral evolution respectively, before making 
general remarks in the conclusion, Section [7J 

2. Genetic drift: the Wright-Fisher and Moran models 

2.1. The Wright-Fisher model 

The simplest, and most widely known [221 E31 EH EE], model of random genetic drift 
is the Wright-Fisher model [H Ej- Suppose we have a population of individuals in 
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Figure 1. Wright-Fisher 'beanbag' population genetics. The population in generation 
t + 1 is constructed from generation t by (i) selecting a gene from the current generation 
at random; (ii) copying this gene; (iii) placing the copy in the next generation; 
(iv) returning the original to the parent population. These steps are repeated until 
generation t + 1 has the same sized population as generation t. 



generation t who mate randomly to produce the new generation t + 1. We focus on one 
particular gene which may be in one of two states (alleles) which we call A and B. The 
Wright-Fisher model is based on the idea of the gene pool of the t generation individuals 
being sampled to give the genetic structure of the t+1 generation. In reality, as discussed 
in section [I] individuals are diploid, but for convenience we assume that individuals are 
haploid, that is, there is one-to-one correspondence between individuals and genes. The 
process of constructing the new generation from the current one is analogous to sampling 
coloured balls — representing different alleles (types of gene) — from a bag or urn. The 
steps in the process are illustrated in Fig. [TJ 

Suppose that in generation t there are n' A alleles and (N — n') B alleles. We 
wish to sample this gene pool iV times (with replacement) in order to create the new 
generation t + The probability that in N trials of sampling the pool will get n A 
alleles is simply given by the binomial distribution: 



n 



where p\{n') is the probability of picking an A (= n'/N) and P2(n') is the probability 
of picking a B (= (N — n')/N). So the probability that there is a transition from the 
original state (characterised by n' A alleles) to the new state (characterised by n A 
alleles) is given by 

Pnn>=( N )(i) ■ (2) 



It should be noted that many mathematicians will write p n > n when they are referring to 
the transition probability from the state n' to the state n, whereas we follow the usual 
convention in physics and denote this by p nn >. This makes much of the matrix algebra 
more natural — involving post-multiplication of vectors, rather than pre-multiplication. 
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What we wish to calculate is the probability of finding n A alleles in generation 
t, given that there were n initially. We denote this probability by P(n,t\n ,0), which 
we will frequently abbreviate to P(n,t), taking the initial condition as given. Since the 
construction of the gene pool at generation t + 1 only depends on the state of the gene 
pool in generation t, and not on its state in earlier generations, the process is Markov. 
Theoretical biologists, like mathematicians, prefer to model such processes in terms of 
Markov chains. Thus they define firstly, a probability vector 

MM) \ 



P{t) 



P(2,t) 
P(3,t) 



V 



and secondly a transition matrix 
/ Pn P12 ■ ■ 



V 



P21 P22 



V 



The evolution of the system is then given by the matrix equation 
P(t + l)=VP(t) or P(n,t + l) = Y / Pnn'P(n',t). 



(3) 



Note that the sum of the components of the probability vector must add up to 1, and 
the sum of the entries in any one column of the transition matrix must add up to 1 
(since ^ n P{n, t) = 1 and J2nPnn' = !)• Matrices whose column entries add up to 1 
are known as stochastic matrices, although the convention of writing the initial state as 
the index on the left means that in the literature the statement is usually that the row 
entries add up to unity. It is now straightforward to formally express the state of the 
system in generation t in terms of this initial state: 

P(t) = VP(t - 1) = VV P(t -2) = ...= P'P(O) . (4) 

Clearly the large t behaviour will be dominated by the largest (in magnitude) eigenvalues 
of V and their corresponding eigenvectors. The eigenvalues of the Wright-Fisher model 
are known [26], and are given by 

X r={ r )W> ^0,...,iV. 

Some simple deductions may be made from the Wright-Fisher model formulated 
as a Markov chain. For instance, we may ask what the system looks like after a large 
number of generations. If it tends to a stationary state, and if we call this stationary 
probability vector P_* , it follows that VP* = P*. That is, P* is a right eigenvector of 
V with eigenvalue 1. Intuitively it seems clear that the final state is either all A (the 
A allele has become fixed) or all B (the B allele is fixed). This would correspond to a 
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Here II is the probability that allele A will eventually become fixed. If we calculate 
V P_* using the above expression, we find that it is indeed equal to P*, confirming that 
P* is of this form. 

Another simple deduction is that the expected number of A alleles does not change 
from one generation to the next: 

(n(t + l)) = J2nP(n,t + l) = J2nY,Pnn'P(n',t) 

n n n ' 

= $>'P(n',t) = <"(*)>, (7) 

n> 

since J2n n Pnn> = Npi = nl . In particular, (n(t)) = (n(0)) = no. We can use this result 
to determine II in Eq. ([6]). To do this we note from Eq. ([6]) that as t — > oo only two 
states are possible: n = N (with probability II) and n = (with probability 1 — IT). 
Therefore 



lim(n(t)) = ATI + 0(1 -IT) ^n = NU. 



t— »oo 



(8) 



This gives the explicit form for P* and also tells us that the probability that eventual 
fixation of A will occur is uq/N. 



2.2. The Moran Model 

Statistical physicists are not used to formulating stochastic processes in the way 
described above for the Wright-Fisher model; they usually use continuous time master 
equations and define transitions for a single stochastic step rather than for N bundled 
together [27] . A variant of the Wright-Fisher model, which is closer in spirit to stochastic 
processes considered by statistical physicists, was introduced by Moran [28] . This model 
does not have a fixed "previous generation" from which N genes are sampled (with 
replacement) to form the next generation. Instead, genes are chosen to die one at a 
time, and replaced by new ones which are A with probability p\ and B with probability 
P2- These probabilities are calculated from the state of the system prior to the death of 
the chosen individual: p\(n) = n/N and P2{n) = (N — n)/N, where n is the number of 
A alleles just before the combined birth-death event. 

The Moran model does not have non-overlapping generations, as in the Wright- 
Fisher model: after iV sampling events in the Moran model, each gene need not have 
been replaced, and several might not have survived very long at all. It is therefore said 
to have overlapping generations. Due to the impossibility of unambiguously identifying 
a "generation" in this generation in the Moran model is frequently defined as 
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the period over which a single sampling event takes place. This is a confusing use of the 
word "generation" , but at least it is well defined. So, in one generation the population 
of A may go up by one (if a B is chosen to die, and an A is chosen to be born) or down 
by one (if an A is chosen to die, and a B is chosen to be born). The number of A alleles 
will remain unchanged if both an A is chosen to die and to be born, and also if a B is 
chosen to die and to be born. This gives the transition matrix to be 

-§)pi(n), ifn' 
^ Pl (n) + (l - §) p 2 (n) 



Pn 







= n+ 1 
if n' = n 
if n' — n — 1 
otherwise , 



(9) 



from which it follows that J2 n 'Pn'n = 1- 

The explicit expressions for the non-zero elements of the transition matrix of the 
Moran model are therefore 



Pn+l'i 



Pn 



Pn—ln 



n 
N 



+ 1- 



n 
N 



1 



n 
N 



(10) 



The transition matrix (Q for the Moran model is much simpler than for the Wright- 
Fisher model, being tri-diagonal. Therefore it is not surprising that the eigenvalues of 
the transition matrix (Q can, as for the Wright-Fisher model, found exactly [28] : 

r(r — 1) 



A, 



1 - 



N 2 



0. 



N . 



but, unlike the Wright-Fisher model, the corresponding eigenvectors are also known [29|. 
The two largest eigenvalues are both unity; they have left eigenvectors (1, 1, . . . , 1) and 
(0, 1, . . . , N) and corresponding right eigenvectors (1, 0,0,..., 0) T and (0, 0, ... , 0, 1) T . 
Clearly the last two column vectors correspond to the loss and fixation of the A allele. 
The third largest eigenvalue, A2 = 1 — (2/N 2 ) has the left eigenvector (0, N — 1, 2(iV — 
2),...,n(iV-n),...,0) and right eigenvector (-(1/2) (N - 1), 1, 1, . . . , 1, -(1/2)(JV - 
1)) T [28] [29] . This column vector governs the behaviour, after a large number of 
generations, of the non-fixed states. In particular, we have, once t is sufficiently large, 







P(t) 



V 



no 
N 



/ 



+ 



6n (N - n ) 
N(N 2 - 1) 



/ 



(N-l) \ 
2 



V 



1 



1 - 



N 2 



(12) 



where uq is the initial number of A alleles. We see that the unfixed states are equally 
populated except at the boundaries, in agreement with graphical representations of 
the pdf displayed in, e.g., [21 [23], which show it to be essentially flat away from the 
boundaries after many generations. 
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2.3. Mutations in the Wright-Fisher and Moran Models 

So far we have been considering situations where the change in composition of the 
population is caused by pure genetic drift — randomness with no underlying deterministic 
behaviour. We now include the effects of mutation: an A allele may mutate with a 
probability of u to a B allele, and a B allele may mutate with a probability of v to an 
A allele. These are probabilities per generation, so once again these quantities in the 
Wright-Fisher model will differ from those in the Moran model by a factor of N. 

To include mutation, one chooses an allele from the current population to die, and 
replaces it with one of a type chosen also with a probability proportional to the current 
population sizes. This is as in the original models. The mutational stage is inserted 
at the end of the process, so the chosen successor is allowed to mutate before being 
introduced into the system to define the new population. In other words, when we pick 
an allele to be the "parent" of a "child" in the next generation, the offspring can mutate 
with the probabilities u or v. For instance, if an A allele is chosen, there is a probability 
of (1 — u) that the replacement allele is also an A and of u that it is a B. So, with 
mutation added, 




and 

p 2 (n) = u + (1 - v) (l - . (14) 

The transition matrices for the Wright-Fisher and Moran models are again found from 
Eqs. ([1]) and fl§J), but using p\ and p 2 given by Eqs. (fT3|) and (fT4"l) . So for instance, the 
non-zero elements of the transition matrix for the Moran model are 




Allowing for the possibility of mutation means that the expected number of A 
alleles is no longer conserved. For the Wright-Fisher model this can be calculated as in 
Eq. ([7j), but now 

Y,np nn , = N Pl = (l-u)ri + v(N-ri), (16) 

n 

and so 

(n(t + 1)) = (1 - u)(n(t)) + v (N — (n(t))) . (17) 
For the Moran model we calculate J2 n n Pnn' directly from Eq. ([151) and find that 

w« + D> = <»(«)> us) 
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which is of the same form as Eq. (1171) . apart from the factors of iV dividing the 
mutation probabilities. Eqs. (JTTj) and ([TBI) give the number of A alleles in an infinite 
population, where no fluctuations are present. Written in terms of allele frequencies, 
a t = (n(t))/N and b t = 1 — a t , Eq. (JT7J) is nothing else but the deterministic equation 
for mutational change, a t +i = (1 — u)a t + vb t , found in many textbooks on population 
genetics [221 1231 123] ■ 

This difference equation for a t may be easily solved by writing a t = v/(u + v) + c t , 
so that ct+i = (1 — u — v)ct- The equation for c t can be solved iteratively to give 
Ct — (1 — u — v )'c , which yields 

v 



a t = h (1 — u — vf 

u + v 

V 



fl 



(1-u-vy +ao(l -u-vf . (19) 



u + v. 

i — [ i — a i i r an fl — u — V x * 

u + v 

Since < 1 — it — v < 1, a* — > + u) as t — > oo. 

The addition of mutations also means that neither the A nor the B allele can 
become fixed — there is always the possibility of an allele being created by mutation, 
even if there are currently none present in the population. Thus the behaviour of both 
the Wright-Fisher and Moran models with mutation, after a large number of generations, 
is more complicated to obtain. Nevertheless, the eigenvalues of the transition matrices 
are known. For the Wright-Fisher model [26J: 

X r = (l- u -vy^)^, r = 0,...,N, (20) 

and for the Moran model 12 



X r = l-(u + v)^-(l-u-v)^j^, r = 0,...,N. (21) 

The Moran model with mutation can be solved analytically [301 Si]- 111 addition, 
approximate methods exist, such as the diffusion approximation discussed in Section 
[H which describe these models well. Finally, all of the models can be generalised 
to the cases with two sexes and diploid individuals [29]. These models are more 
complicated, but do not in our view make the underlying ideas or the connection between 
different approaches any clearer. For all these developments we refer the reader to the 
literature on the subject [291 E31 [251 E3]; meanwhile, in Section [5] we shall examine how 
sexual reproduction and diploidy are handled in models that are more easily studied 
analytically. 



2-4- Migration between Islands 

In the models of genetic drift considered so far, new individuals were produced from 
their "parents" randomly. If the organisms had been diploid, we would have said that 
the mating was random. This randomness assumption may model some situations, for 
example, if sperm is released into a pond or lake by a group of male fish in order to 
fertilise eggs laid by females nearby. However, there are many instances where mating is 
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not random. One of the most interesting is when groups of individuals are geographically 
separated. Then, if there is little migration between the populations, so that they are 
effectively isolated, stochastic effects mean that they will develop in different ways. For 
instance, some alleles may be lost in some of the populations, but not in others. 

Models which describe the population genetics of subdivided populations which are 
partly isolated from each other are called "island models" and were first investigated 
by Wright [2]. The island populations themselves are referred to as demes. In early 
models, migration between demes was essentially independent of the distance between 
the islands which contained them, that is, each deme interchanged genes equally with 
every other deme. More elaborate models soon followed, such as stepping stone models, 
where the rate of migration between demes is a function of the distance between the 
islands j32j [33l [34] . These models can be generalised by defining a migration matrix, 
gij, which specifies the probability that, if an individual in deme j has had an offspring, 
the later immediately migrates to deme % (so that it has arrived at the start of the next 
generation). 

The inclusion of migration into the models so far considered in this section is 
in principle straightforward, but in some cases the matrix of transition probabilities 
becomes cumbersome to write down. There are also potentially several different models 
which can be constructed depending on the order in which the various processes of 
birth, mutation and migration take place, how many migrants are permitted into and 
out of each deme, and so on. The introduction of migration into Moran-type models 
is, as expected, easier to formulate and seems more natural to statistical physicists. 
In the usual spirit of this approach, migrations will be considered as single events, for 
instance, a migration of a gene from deme j to deme i. This is in contrast to many other 
approaches to stochastic models of migration, including one introduced by Moran |35j, 
where a fixed number of individuals are chosen to migrate from one deme to another. 
We shall consider the class of models for which migration can be characterised through 
the matrix g^ defined above: this class includes the simple cases considered by early 
workers, such as uniform migration rate between all demes or stepping stone models. 
The number of demes will be taken to be L and we will assume that they will each 
consist of ./V genes. For simplicity we will assume that the mutation rates will be the 
same on each island. In other words, each island will be an exact copy of the systems 
considered previously in this section, but coupled together by migration processes. 

To illustrate the basic idea, suppose that there are only two islands, not L as 
assumed above. Since each island contains exactly iV genes, only two integers are 
required to specify the state of the system: n\ and n 2 , where is the number of A 
alleles in deme i, where i = 1,2. We begin with the basic Moran model of section [2~2l 
where only the birth/death process is considered. When migration is added, a "death" 
is not necessarily followed by a "birth" — it can be followed by a migration event. The 
algorithm we use is as follows. First, we randomly pick a "parent", or gene to be 
copied, from deme i with probability f\. Suppose this parent is on island 2. Then with 
probability gi 2 a migration event takes place, which entails a death on island 1, and the 
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P(ni+l,n 2 ) (ni,n 2 ) 


= /l 


(1 


- 021 


P(n 1 -l,n 2 ) (ni,n 2 ) 


= /l 


(1 


- 921 


P(ni,n2+i) (ni,n2) 


= /a 


(1 


~ 912 


P(ni,n,2— 1) (m,n2) 


= /a 


(1 


~ 9X2 



copied gene from island 2 occupying the space that was left behind. With probability 
922 — 1 — 9i2 we assume that no migration event takes place and a death takes place on 
island 2, with the copied gene from island 2 replacing it. Note that the total probability 
per generation that an individual migrates from deme % to deme j per generation is gjifi- 
Since N is fixed for each deme, if B increases by 1 on island i, this is equivalent to 
A decreasing by 1 on island i. So the four independent transition probabilities are given 
by 

NJ N ^ J2y12 \ NJ N ' 
. n 2 \ ti 2 , n 2 / nA 

1 -n)n +s ^n\ 1 -n)^ (22) 

with 

P(ni,n 2 ) (ni,ri2) = 1 P(ni+l,ri2) ("li^a) 1)«2) («i,«2) 

P(ni,n 2 +1) (ni,n 2 ) P(m ,n 2 — 1) (ni ,n 2 ) • 

(23) 

We can obtain the deterministic equation, valid in the limit N — > oo, describing the 
effect of migration on the number of A alleles on one island, by using the same method 
as was used to find Eq. (Tl8i) . describing the effects of mutation. In this case we calculate 
J2nx,n 2 n iP{n 1 ,n 2 )(n' 1 ,n' 2 ) from Eqs. (j22J) and ([23]) to find that 

(n x (f + 1)) = <m(t)> + ^ - (m(t)>] , (24) 

which is of the same form as Eq. (fT8l) for mutations. The corresponding expression for 
the expected number of A alleles on island 2, (n 2 (t + 1)) is obtained by exchanging all 
1 and 2 indices. Using these expressions one can show that the weighted mean 

fi{t) = /i 5 2i(tii(0) + f29i2(n 2 (t)) (25) 
is conserved, i.e., n{t + 1) = n(t), whilst the difference 

8n{t) ee (m(t)) - (n 2 (t)) (26) 
decays geometrically since 

fl921 + /2<7l2 



<Jn(t + 1) 



1 



tfra(t) . (27) 



AT 

In other words, migration results in a convergence of the mean allele frequencies on the 
two islands to a common value which in turn eventually reaches a stationary value n* 
that can be worked out from equation ([25]) once the initial values ni(0) and 712(0) are 
known: 

* toini(O) +/ 2 £i2n 2 (0) 
7i = — . (28) 

71021 + 72012 
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Population genetics 


Population ecology 


Language evolution 


Efene 


individual 


token 


allele 


species 


variant 


island 


patch 


speaker 


deme 


local community 


grammar 


population 


met acommunity 


speech community 


mutation 


speciation 


innovation 


migration 


immigration 


conversation 



Table 1. The correspondences between constituents, concepts and processes in 
population genetics, population ecology and language evolution. 

2.5. Mapping between population genetic, ecological and linguistic models 

In section [1] we have already discussed the analogies that can be drawn between models 
of population genetics, individual based models (IBMs) in ecology and models of the 
evolution of language. The mathematical formalism we have introduced in the current 
section allows us to make these analogies more concrete, and to define mappings between 
models in these three distinct areas. 

The genetic models which we have described have been of a gene at a single position 
(locus) on a chromosome, involved only one chromosome, and has not included sex: new 
genes were "born" from a single parent. In this very simplified genetics, we can directly 
identify a gene with an individual and vice-versa. In this case different alleles can be 
thought of as different species. Therefore in the context in which we have been working, 
the mappings [genes <-» individuals] and [alleles <-> species] is very natural. The idea 
of an island is common to both population genetics and population ecology, where it is 
called a patch, and we have the further correspondence [deme «-> local community]. 

To extend these mappings to the evolutionary model of language change [T2l [13j 
fT5l [T6] described briefly in section CD, we begin by noting that the central idea of this 
approach is to make an analogy between "different ways of saying the same thing"— 
called variants of a word or set of words, and alleles. The specific "word or set of words" 
under consideration is called the lingueme. Thus the first analogy is [variants «-> alleles] . 
In the model [17], each speaker's utterances may contain different variants of a lingume. 
In the case of two variants, the speaker's grammar can be modelled as being made up of 
N tokens, of which n are of one variant and N — n of the other. An utterance by speaker 
% will then produce tokens which modifies his own grammar (death and birth of tokens 
belonging to speaker %) and which may modify the grammar of speaker j (migration of 
tokens from speaker i to speaker j). This gives the further analogies [tokens genes] 
and [speakers <-> islands] . Table [1] summarises the mappings between these different 
areas. 

While we have so far assumed that there are only two alleles, A and B, in the case of 
population genetics, the analogies with population ecology and evolutionary linguistics 
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make it clear that we will frequently be interested in having more than two species or 
variants in our models. Therefore in the next section will generalise the models so that 
there are M possible types of gene/individual/token. 

The interpretation of "genetic drift" , mutation and migration in population ecology 
and evolving linguistics is interesting. The species being considered in the analogy 
are not to be thought of as predators and prey, but rather as species of a similar 
type ( "trophically similar species") competing for a common resource, such as space 
or light. The fact that when a death of an individual occurs, it is immediately replaced 
by another (either through a birth or by immigration) means that there is intense 
competition; as soon as a "vacancy" occurs, it is immediately filled. This coupling of 
the death rate to the birth/immigration rate so that the overall population remains 
fixed, is called the "zero-sum rule" in biology. Models where trophically similar species 
(for example, forests of trees) change their population sizes through purely random 
dynamics analogous to pure genetic drift, are called neutral models [TJ. They are, at 
present, quite controversial [36], mainly because measurable "physical" quantities such 
as the species abundance distribution arise as a result of random effects. In the context 
of neutral theories a mutation corresponds to speciation, where a single individual of a 
species changes into an individual of a different species. This is, of course, a very coarse- 
grained description of a very complex process. Immigration has the same interpretation 
as migration in the genetic case. 

In the linguistics application, we think of different linguistic variants competing for 
use by members of the speech community. Here, the analogue of "trophically similar 
species" would be functionally equivalent variants, or "two ways of saying the same 
thing". For true neutrality, one would further require that no speakers ascribe any 
social value ("prestige") to a variant, which might cause them to reproduce it more (or, 
indeed, less) often they have heard it used thus far. There are many ways in which 
new variants might be created, a process referred to as innovation p3] . Basically, most 
of these arise from speakers needing to communicate something that they have not 
communicated before; other forces behind innovation include articulatory constraints 
which may involve speakers producing one variant, even if they aim to produce another. 
This latter case, in particular, could be modelled by mutation rates between variants, 
as we have described for the genetics application. Migration corresponds to the process 
whereby tokens from the grammar of one speaker are incorporated into the grammar of 
another speaker with whom the first is conversing. Two factors contribute to the overall 
rate at which tokens migrate from one grammar to another: first, the is the frequency 
with which two speakers converse; secondly, speakers may give higher or lower weights to 
the utterances of other speakers based, for example, on the listener's perception of how 
similar the interlocutor is to themselves, or whether they wish to acquire social status 
by imitating variants associated with a prestigious group. Such sociolinguistic effects 
are discussed in more detail in [T3]. Meanwhile, it has been proposed [20] that when 
dialects are brought into contact, as happened for example when people from Britain 
and Ireland migrated to New Zealand, a truly neutral theory (i.e., one in which these 
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social factors are claimed to be unimportant) should describe observed effects, such as 
convergence to a relatively homogeneous dialect among a large community of speakers 
on a timescale of order of a human lifetime. Whether a genetic drift-like process actually 
allows for such behaviour is another matter of debate and is under current investigation. 

In this section we have introduced the simplest stochastic models of population 
genetics and brought out analogies with similar models in population ecology and 
evolutionary linguistics. In the next section we will write these models in a form more 
familiar to statistical physicists, generalise them, and also show that when the number 
of genes/individuals/tokens for an island/patch/speaker is large, they may be replaced 
by models based on Fokker-Planck equations, rather than on master equations. 

3. Master and Fokker-Planck equations 

As we have seen in Section El the Wright-Fisher model is quite difficult to analyse, 
especially when mutation and migration are allowed for, and so very early on workers 
resorted to the diffusion approximation [37 1 1381 139] . This is a description of the process, 
valid when N is large, where (a) the (rational) allele frequencies n/N are replaced by 
real numbers x, < x < 1, and (b) the generational label, t, is rescaled by a factor 
of 1/N. Specifically, t now measures units of N generations, so that one generation 
becomes only 1/N units of t. The governing equation is now a Fokker-Planck equation 
for P(x, t'\xq, 0) which for pure genetic drift has the form 



where r' = t/N. In essence, this is a mesoscopic description, whereas the Wright-Fisher 
equation was a microscopic, or gene-based description. It is possible to add terms to 
Eq. fl29|) which account for mutations and migrations. 

In the last section we also described how the Moran model, while similar to the 
Wright-Fisher model, was in many ways easier to write down and generalise. Therefore 
in this section we will formulate the basic stochastic processes that interest us in terms 
of Moran-type models, rather than Wright-Fisher models. An advantage is that Moran- 
type models may be formulated as master equations, a step that is not typically followed 
in the mathematical biology literature. The advantage of formulating a stochastic 
process in this way, is that there there then exist systematic methods to approximate 
the master equation for large N as a Fokker-Planck equation [27]. It will turn out that 
the Fokker-Planck equation corresponding to the large N limit of the Moran model has 
the same form as Eq. fl29|) . that is, has the same limit as the Wright-Fisher model. 

We will begin by discussing the reformulation of the Markov chain based description 
of population genetics given in Section El to the master equation description which we 
will use in the rest of the section. 




(29) 
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3.1. Master equations and population genetics 

We begin by rewriting the standard dynamical relation for a Markov chain, P(n,t + 1) = 

Y,n'Pnn'P(n',t) as 

P(n, t+1)- P(n, t) = J2Pnn'P(n', t) - $>„> n P(n, t) , (30) 

n' n' 

using J2 n ' Pn'n = 1- Now the terms in the sums with n' = n may be omitted — since they 
cancel between the two terms on the right-hand side — and so only the terms involving 
an actual transition, that is, terms with n' ^ n, appear in the equation. 

Up until this point, we have been careful not to refer to t as a "time" — it has simply 
been an integer labelling generations. However now we interpret t as time divided into 
intervals At taken to be sufficiently small that at most one sampling event occurs during 
it. Dividing Eq. ( !30l by At, and omitting the terms with n' = n, yields 

P{ n,l + U)- PM (*W) PKi) _ £ (-) P(M , (31) 

We now adopt a different view of the stochastic process. Instead of assuming that 
one exactly one sampling event happens per generation (including n — > n, where no 
transition actually occurs), we now have sampling events occur at unit rate so that on 
average one event takes place per generation. Once t is sufficiently large, by far the 
most likely number of sampling events that will have taken place will be equal to t. 
Therefore one expects the properties of the continuous and discrete time processes to 
concur at large times. Additionally, we replace the transition probability from state n' 
to n per generation p nn i with the transition rate per unit time T(n\n'). 
These two quantities are related via the expression 

p nn , = T(n\n')At + O(At) 2 , n ^ ri (32) 

where the terms of order (At) 2 and higher represent the probability of two or more 
events happening in time At. Substituting (1321) into Eq. (131j) . and letting At — > 0, gives 

= £ T (n\n') P(n', t) - £ T(n'\n) P{n, t) , (33) 

which is the usual master equation used by statistical physicists [27J HQl SI]- Typically, 
the transition rates T(n\n') are assumed given (this specifies the model) and we wish to 
determine the P(n,t). The initial condition is that the system is in state Uq at t = 0. 
Boundary conditions may also have to be given. 

To illustrate these ideas we will look at three examples, all of which were introduced 
in Section [21 We begin with the Moran model for pure genetic drift, which has non-zero 
elements of the transition matrix given by Eq. ( jTOl) . In the continuous time formulation, 
the population evolves by two individuals being sampled uniformly with replacement 
from the population as a Poisson process with the mean time between events taken 
to be unity. One of the sampled individuals is designated the parent which is copied 
to create an offspring; the other sampled individual is sacrificed to make way for the 
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new offspring. To obtain the transition rates in the master equation (1331) . we need not 
consider the matrix element p nn , since this does not involve an actual transition. We 
find therefore 

r( n+ l| n ) =(!-£)(£), r(B -l|„) =(£)(!-£). (34, 

We now turn to the Moran model with mutation, which has non-zero elements given 
by Eq. ffT5|) . We will contrast two ways one might construct a continuous-time process to 
mirror the discrete time process as defined in Section 12.31 The most obvious is to follow 
the prescription given earlier: that is, to have Poisson sampling events as described in 
the previous paragraph, and to mutate the copy of the parent to allele B with probability 
u if the chosen parent was an A or with probability v in the opposite direction. The 
transition rates T(n + l|n) and T(n — l\n) are then given by the expressions for p n +in 
and p n -i n appearing in Eq. ( fl~5l) . i.e. 



2 



r(» + i|n) = (i-«)(i-^) 

T( n -l| n ) ^1 -,)(£) (!-£)+«(£)'. (35) 

The second approach we shall take differs in that the mutation process will 
is divorced from the death/birth process — the two are considered to take place 
independently with mutations occurring spontaneously, rather than between a death 
and a birth. Specifically, we shall sample on average once per unit time a parent- 
offspring pair, as previously described. With probability 7, we replace the offspring 
with a copy of the parent without mutation. The rest of the time, i.e., with probability 
1 — 7, we switch the offspring individual to allele state B with probability U if it is 
in state A; otherwise we switch to state A from B with probability V. This leads to 
transition rates of the form 

T(n + l\n) = 7 (l - £) (£) + (1 - l)V (l - £) (36) 
T( B - 1|») = 7 (£)(l -£)+(!- 7 )I7 (£) . (37) 

To make contact with the discrete-time Moran process, we shall insist that, at any given 
time, the probability that the next event to occur in the continuous-time process is equal 
to that for the discrete-time process, given the same starting configuration. To do this, 
we rewrite p n +in and p n -in in (EES) as 



Pn+ln = (1 - U ~ V) (l - ^] (^] + V ( 1 - 



NJ \NJ V N 
n\ ( n\ ( n 

n) {n) +u {n 



p n „ ln = (1 - u - v) ( 1 - + u ( ^ ) . (3* 



Correspondence is obtained if we put 

v u 
7 = 1 - u - v , V = and U = , (39) 

U + V U + V 

that is, if a fraction u of the time, A alleles spontaneously change state to B alleles, 
and a fraction v of the time, B alleles change to A. Note this leads only to positive 
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transition rates if the mutation probabilities satisfy u + v < 1. Therefore, not all Moran 
processes in which birth, death and mutation are intertwined can be represented by 
a continuous-time model in which mutation occurs independently of birth and death. 
Nevertheless, a model with transitions rates of this form was introduced by Maruyama 
(Model I in the appendix of [25]) and it has been remarked [12] that the results obtained 
by this model and the Moran model seem to be essentially identical. Our derivation of 
the former starting from the latter shows why this is the case. 

We now turn to the the Moran model with migration between two islands, discussed 
in Section 12.41 In this case the state of the system is specified by two integers: 
n = (ni, n 2 ). The considerations leading to the master equation ( l33l) . where the state is 
specified by a single variable, generalise in a straightforward way to the case where the 
state is specified by a set of integers: 

= £ T (n\nf) P(nf, t) - £ T(nf\n) P(n, t) , (40) 

where in the case under consideration here n = (n 1; r^). The transition probabilities are 
given by Eq. (1221) . To construct the corresponding continuous-time process, we take the 
rate at which a parent is chosen in deme j , the offspring of which displaces an individual 
in deme i, to be 

°ij = fm > ( ") 

where we recall that fj is the probability the parent is in deme j, and gij the probability 
that the offspring lands in deme i, conditioned on the parent being in deme j. Since 
Yjjfj — 1 an d J2i9ij — 1 f° r an Y j-i the total rate at which pairs of individuals are 
sampled is 

E G « = E/i£^ = 1 ( 42 ) 

i,j j i 

as required, even though the sampling is now non-uniform. 
The transition probabilities for this model then follow as 

T(m + 1, n 2 | ni , « 2 ) = G n (l - ^) ^ + G la (l - ^) ^ , 
T(m - l,m\ ni , n 2 ) = G n (l - ^) ^ + G 12 ^ (l - 2?) 

T (ni , „ 2 - l| ni , n 2 ) = G 22 (l " |) | + G 21 | (l-£). (43) 

Note that the diagonal matrix elements G,, give the rates at which an individuals are 
chosen to reproduce without their offspring subsequently migrating. From the master 
equation ( 1401) and the transition rates ( 1431) . we can obtain the deterministic equation, 
valid in the limit N — > oo, describing the effect of migration on the number of A alleles 
on one island, by calculating d(rij)/dt. Focusing on island 1, we multiply Eq. (1401) by 
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rii an d sum over all n\ and n 2 . By shifting the quantities being summed over by ±1 in 
some of the sums over rii, one finds that 

= ( T ( n i + n 2\ni, n 2 )) - {T(m - 1, n 2 |ni, n 2 )) . (44) 

Substituting for the transitions rates using Eq. ( H3l) yields 

d(nx) _ /(n 2 ) (m)\ 

"dT- Gl3 V"r~"rJ ' (45) 

which is in agreement with Eq. (124)) as one would expect. 



3.2. Fokker-Planck equations and the large N limit 

Having obtained the transition rates (1341) . (157)1 and (143)) for pure genetic drift involving 
two alleles, together with mutation and migration between two islands, we can now 
investigate the form that the governing equations take in the limit where N is large. It 
will turn out that these processes can be described by Fokker-Planck equations in this 
limit. This will allow more complex situations, for example involving L islands or M 
alleles, to be formulated in a rather straightforward way. 

We again begin with the Moran model for pure genetic drift. When formulated as 
a master equation, it has the form (1331) with transition rates (134)) . Such an equation 
is of "the diffusion type", since there is no macroscopic equation and the large N 
approximation results in a non-linear Fokker-Planck equation |27J. In this case, the 
large N limit can be obtained in a straightforward fashion by substituting n = xN, and 
expanding out any remaining terms in N in a 1/N expansion. We find 

dP _(„ „, l \ \ p u t s 1 dP(x,t) | 1 d 2 P(x,t)\ 



dt V NJ \ N J iV dx 2N 2 dx 2 



lw 1\ f . , 1 dP(x,t) 1 d 2 P{x,t) 
+ [x + — l-x- — ){P(x,t) + — + 



NJ \ N J \ N dx 2N 2 dx 2 J 

- 2x(l -x)P(x,t) + 0(1/N 3 ) (46) 

After a little algebra, this reduces to 
BP 1 8 2 

^ = A7w [x(1 - x)P] + ° (1/iV3) - (47) 
So if we define a scaled time r = 2t/N 2 , then taking iV — > oo we obtain 

dP 1 d 2 

which is the diffusion equation (129)) . This shows that two different gene-based 
models — the Wright-Fisher and Moran models — give the same mesoscopic Fokker- 
Planck equation. This is a familiar situation in statistical physics: microscopic models 
describing the same "physics", but formulated in slightly different ways, typically will 
give rise to the same mesoscopic equation. The only difference between the Wright- 
Fisher and the Moran model is that, as usual, the time-scales differ. The redefinition of 
a generation in the Wright-Fisher model leads to r' = t/N, whereas the scaling of time 
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in the Moran model gives r = 2t/N 2 . In other words, a generation in the Wright-Fisher 
model is N/2 times as long as in the Moran model in this mesoscopic description. 

When the diffusion approximation is applied to the Moran model which includes 
mutational and migrational effects, the rates of migration and mutation have to be 
rescaled by a factor of N. If this were not so, the large N limit of the master equation 
would not be a Fokker-Planck equation of the diffusion type, but a linear Fokker-Planck 
equation which would describe fluctuations about the deterministic equation (fT7|) [27J. 
In the backward-time formulation discussed later (see Section HJ), we will also see that 
this scaling corresponds to a regime in which all relevant processes affecting the ancestry 
of the population occur on the same timescale. 

Let us first consider the models with mutation. The scaled mutation rates, with 
the factor of N mentioned above and a factor of 2 to be explained below, are defined by 

K = f , V=f. (49, 

In Section 13.11 we discussed two ways of introducing mutation into the Moran model. 
The first resulting in transition rates given by Eq. ( 1351) and the second in transition 
rates given by Eq. (1571) . with 7, U and V given by Eq. (I39I) . There is no need to carry 
out the 1/N expansion for each model in turn, since the transitions rates for both are 
identical (although one has the restriction u + v < 1), and which using the scaled forms 
(|49|) read 

T(n + l\n)= l__ _ +— 1- 



N N J \ NJ \NJ N \ N j 

( 2U 2V\fn\[ n\ 2U ( n \ 
T(n - l\n) = (i -_-_)(_) (i - _)+ _ (_) . (50) 

Since the term ( H71) is already of order 1/N 2 , the terms involving U/N and V/N which 
multiply it give terms of order 1/N 3 , which do not contribute to the Fokker-Planck 
equation. Therefore the only additional terms due to mutation are the second terms on 
the right-hand side of the transitions probabilities (1501) . which are 



1 dP(x,t) 



x {P(x,t) + 




0(1/N 3 ) (51) 



N dx J 
which on simplification yields 

A|_ [{Ux - V (l -x)}P] + 0(1/N 3 ) . (52) 

Adding this additional term to Eq. (1471) . and using rescaled time units r = 2t/N 2 , we 
find on letting iV — > oo that the additional mutation processes present in either model 
give rise to the Fokker-Planck equation 

^ = §~ x Wx -V{l-x)}P] + \^ [x(l - x)P] . (53) 

The restriction u + v < 1, which applied to one of the models, takes the form 
U + V < N/2, when expressed in terms of the scaled mutation rates. Since in most 
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applications U and V are numerically small, and in any case we are taking the limit 
iV — > oo, this condition will always hold. Furthermore, Fokker-Planck equation (1531) 
agrees with that found by applying the diffusion approximation to the Wright-Fisher 
model with mutation [23J. The additional factors of 2 introduced in Eq. (1491) were 
included to conform with the conventions used in the Wright-Fisher case. 

The introduction of two islands or speakers has been discussed in section 12.41 and 
the transition rates given in Eq. fj43l) . To derive the Fokker-Planck equation from the 
master equation in this case, we note that the first terms on the right-hand sides of 
these transition rates are exactly as for pure drift, up to the inclusion of prefactors G\\ 
or (?22- Therefore, the contribution from events which do not involve actual migration 
is 

^^l^-^P] + 0(l/N"). (54) 

The second terms on the right-hand side of the transition rates in Eq. (J43]) , for migration 
from island j to island i are: 

1 dP(x, t) \ 




P(x,t) 



+ Gij [Xi + — (1 - Xj) < P(x, t) + 



N dxi j 
1 dP(x,t) 



NJ" 1 N dxi J 



- G^ [(1 - Xi) xj + x t (1 - Xj )] P(x, t) + } 

In order for the Fokker-Planck equation to be of the diffusion type, the rates of migration 
between demes, i.e., G^ for i ^ j, have to be scaled by a factor of N, just as the mutation 
rates were. Let us now parametrise these off-diagonal matrix elements as G^ = 2Qij/N 
where Qij is of order unity. Then, Ga = fi + 0(l/N), and so the Fokker-Planck equation 
becomes 

dP(x, t) I' d 2 

-^ = W^ fi dx] [Xi(l - Xi)P] 

+ (56) 
Introducing r = 2t/N 2 , as before, we can let iV — > oo and obtain the Fokker-Planck 
equation for migration between two islands: 

BP 1 2 d 2 

In this mesoscopic formulation all trace of the individual genes has disappeared, and we 
are left only with a description in terms of the fractions of genes on island % which are 
of type A. 
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3. 3. Migration of genes having M alleles between L islands 

We have so far, for simplicity, mainly restricted ourselves to considering two islands and 
two alleles. However the formalism we have discussed carries over with little modification 
to the cases of an arbitrary number of island and alleles. In order to avoid introducing 
too many complicating looking expressions in this section, we have relegated some of 
the details to an appendix. 



3.3.1. L Islands We assume that on each island we have a deme labelled by i, with rij 
being the number of A alleles and (N — nA the number of B alleles. A parent is chosen 
from island j with probability fj and subsequently leaves an offspring on island i with 
probability gij, where the sum rule gu — 1 — Y^jm 9j% m ust be satisfied. Then proceeding 



as in Section 12.41 the transition probabilities analogous to those in Eq. (122]) are: 

P(m,...,ni+l,...,n L ) {rii,...,n L ) = 




(5f 



and 



P(n 1 ,...,ni-l,...,n L )(n i ,...,n L ) — fi ( 1 E 9ji ] (l yy y 



71* \ 71,; 



To go over to a master equation description, we introduce the set of migration rates via 
Eq. (141j) . The transition rates for the master equation (140j) now follow: 

(Tl'\ Ti ' 
1 ~ iv^J TV ' 

T(m,...,ni - 1,. . .,n L \n u . . .,n L ) = E^i"^ ( X _ ^) ■ ( 60 ) 

To obtain the Fokker-Planck equation, valid when iV is large, is now very simple, since 
the required calculations are as in the L = 2 case. For example, Eq. (|54p is unchanged, 
except that the sum is now from i = 1 to % = L, and the contribution (|55j) holds for all 
pairs i and j with i ^ j. Introducing as before = 2Qij/N for i ^ j, rescaling time 
as r = 2t/N 2 , letting N —>■ oo and noting that Gu — * fi in this limit, we obtain the 
Fokker-Planck equation for migration between L islands: 

dP 1 L d 2 

7*=2g%ffo<i-*)i1 

+ i; (e«^r - {(x, - x,) pi . (6i) 

Here the notation (ij) means "sum over all distinct pairs i and j" . 



Stochastic Models of Evolution in Genetics, Ecology and Linguistics 23 

Fokker-Planck equations may be written as continuity equations [H]; in the case 
of interest here it takes the form dP/dr + YU dJi/dxi = 0, where 

Ji(x, t)= - \fi^~ Ml - Xi)P(x, t)] - J2 Gij [(Xi - Xj ) P(x, t)] , (62) 

is the probability current. Multiplying the equation of continuity by Xk and integrating 
over all Xj (i = 1, . . . , L) gives 

^■(Xk) f OP r dJi f 

—— = J dxx k — = -J2 J dxxk—^- = J dxJ k , (63) 

since the probability current vanishes on the boundaries jH]. From Eqs. fl62l) and (|63j) 
we have 

^- = -ESk j ((x k )-(x j )). (64) 

To make contact with previous results for finite N, we reintroduce n t = Nxi, Gij = 
2g tj /N and t = N 2 t/2, so that Eq. (JM} becomes 

^ = -T,G k A((n k )-(n J )). (65) 



This is in agreement with Eq. ff45|) found directly from the master equation in the case 
L = 2. 

Let us briefly consider under what conditions the mean of the frequency of A alleles 
across all islands is conserved. For this we require 

4 5>*> = - E E % «n*> - K» =E%^E (G kj - G jk ) = .(66) 

k k j^k j k=£j 

For this to hold for any set of allele frequencies (rij), we require 

E °kj = E G Jk Vj. (67) 

Since the total mean A allele frequency is conserved by migration (and rigorously 
conserved in the absence of drift, not just on average) models for which the migration 
rates satisfy this equality are called conservative. We will see in Section |5] that a number 
of exact results are known for models with conservative migration. 



3.3.2. M Alleles So far in this article we have restricted ourselves to situations where 
each gene has only two alleles: A and B. There is no reason, other than grounds of 
simplicity, to do so, and it is possible to generalise the treatment to the situation where 
there are M possible types of alleles, A a , where a = 1, . . . , M. If one is considering a 
single island, the possible states are labelled by the number of A a alleles denoted by 
n a , where a = 1, . . . , (M — 1). Clearly, since J2a=i n a = 1> n M = 1 ~ Y^aLi^ n a is not 
an independent variable. If there are L islands, then there are L(M — 1) independent 
variables: n ia with i — 1, . . . , L and a — 1, ... , (M — 1). 
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(70) 



We begin by assuming that there is no mutation or migration (that is, L — 1). The 
basic transition probability associated with an Ap allele being replaced by an A a allele 
is 

P(n 1 ,...,n a +l,...,n -l,...,n M _ 1 )(n) = (j^j (jy) ' 

where a (3, a, (3 = 1, . . . , (M — 1). Clearly if the transition involves the Mth allele 
the expression has to be modified, since the number of alleles of type M is not an 
independent variable. The transition rates which appear in the master equation are 
obtained as before through consideration of a continuous-time process in which pairs 
of individuals, one of which reproduces and whose offspring displaces the others, are 
sampled on average once per unit time. This leads to 

T(ni,...,7i a + l,...,7i J 8-l,...,n A f-i|a) = fej (jjj?) , (69) 
where a//J and neither is equal to M, and 

T(ni, . . . ,n a + 1, . . .,n M -i\n) = T(n 1( . . . ,n a - 1, . . .,n M -i\n) 

NJ { N ) ' 

if either allele a M) increases at the expense of allele M or allele M increases at the 
expense of allele a M), respectively. 

We can now carry out a large N expansion on the master equation with the 
transition rates (|69l) and (jTOjl . just as we did in Eq. (jlBT) . The algebraic details are a 
little more messy for general M, and so we have given them in the Appendix. Rescaling 
the time as before, one finds the Fokker-Planck equation 

BP 1 { M - 1 f) 2 M-1M-1 f)2 ~\ 

in the limit iV — > oo. This the usual multi-allelic diffusion equation [43] 

If we now add the possibility of mutations occurring, MiM — 1) mutation 
probabilities have to be defined to describe mutations from allele Ap to allele A a . We 
will denote this probability by u a p (a ^ (3). To see how this matrix enters the formalism, 
let us return to the the case of two alleles and rewrite Eqs. (fT3|) and <HM as 

2 

p a = T, u ^/N), (72) 

(3=1 

where we have written A and B as A\ and A2 respectively, n-i — N — m, and where the 
mutation matrix is given by 

u= . 73 

u I — v 

Notice that the condition that the columns of the matrix u add up to 1 ensures that 
HaPa = 1, since J2p n p — N. For the case of M different alleles u has entries u a p if 
a 7^ P, and 1 — u -y/3 f° r the diagonal entry (/3, (3). Therefore to deduce the form of 
the transition probabilities in the Moran model with M alleles and mutation, we may 
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follow the same arguments which led to Eq. (1151) in the case of two alleles. For instance 
if an A/3 allele is replaced by an A a allele, then 

'np 
.N 



P(m 



.,n a +l,. 



-l,...,n M -i)(n) = Pa{lk) (j^J 



(74) 



(M-1) 
7=1 



n 



7" 



This will hold for all a and (3, with a ^ f3, as long as we interpret um as 1 — X 
The change in the mean value of n a with time can be calculated in an analogous way 
to Eq. ([TBI and one finds that 



(n a {t + l)) 



M*)> + E 



U af 3 



N 



(n a (t)) 
N 



(75) 



As in Section 13.11 we shall follow two approaches to mutation in the continuous 
time formulation. First, we shall have pairs of individuals sampled on average once per 
unit time, and the offspring of the parent changed to allele a with probability u a/ 3 given 
that the sampled parent carried allele (3. Transition rates for this model are then given 
by 



T( ni 



l...np-l... n M -i\n) = E M «<5 




M-1 
a=l 



(76) 



In the 



for a (3 and it is understood that um is always implicitly given by 1 — J2 
Appendix we derive the Fokker-Plank equation for this system, valid when N is large. 
Again, it is necessary to rescale the rates of mutation between different alleles, i.e., 

Nu aP 



aP 



a^(3 



The Fokker-Planck equation reads 

M-1 Q 



OP v 



M-1 M-1 



a=l 



dx r 



d 2 



a=l /3= 



[ dx a dxp 



[V a p{x)P] , 



where A a (x) and T> a p(x) are given by 



M 



E (M<x0 X P ~ Mpa x a) 
P=l 



(77) 



(7f 



(79) 



and 



V a p{x) 



(80) 



J x a (l — x a ), if a — (3 
\ —x a xp, if a 7^ (3 . 

Note that the term in the sum with a — [3 in (1791) is zero, so it is not necessary to define 
the diagonal rescaled mutation matrix elements U aa 

It is worth, briefly, considering the variant of the continuous-time mutation model 
in which mutation events are separated from the birth-death events. As in Section 13. 1[ 
we shall sample pairs of individuals on average once per generation, and a fraction 7 of 
the time replace one of the pair with a copy of the other; the remaining fraction 1 — 7 
of the time, one of the pair shall be mutated to allele a with probability U a p, given that 
it carries allele (3. The transition rates for this model are 



T(ni . . . n a + 1 . . .rip - 1 . . . n M -i\{n) = 7 



na_np_ 
N N 



+ (1-7) 



II 71/3 



51) 
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Compare now with (1761) which can be written as 

T(ni ...n a + l...np-l. ..n M -i\n) = I 1 - ^ u Sa J ^t^t + Ua& ~§~M ' ( 82 ) 

Since no allele frequencies n$/N with 5 ^ a, j3 appear in (IHTI) . the only way the two can 
be made equal is if we take u a s = u a . Then, the sum over n$/N gives 1 — n a /N, since 
all allele frequencies add up to unity This results in 

T(ni . . .n a + l . . .np - 1 . . .n M -i\n) = (^--J2 u s^ ^~§ +Ua ~N ' ^ 

Correspondence with (18"T|) is then obtained as long as 

it 

7=1-£ M 5 and U a = —^, (84) 

where we have taken U a p = U a for all f3 ^ a. We see that, in common with the case 
M = 2 discussed in Section I3.1[ the sum of mutation rates Yls u s must be less than one 
in order for the simplified model (in which mutation is a spontaneous process) to have 
the same dynamics as the full model in which mutation, birth and death are combined. 

We remark that this condition is satisfied in the limit iV — > oo when mutation rates 
are scaled with iV as above. The Fokker-Planck equation f!78|) results, albeit with a 
simplified deterministic (potential) term 

M 

A a (x) =U a -J2 Upx a , (85) 
p=i 

on account of the mutation rates depending only on the end product of the mutation. 
Note that this time, the term (3 = a is included in the summation. Even with this 
simplification, the Fokker-Planck equation is a nonlinear partial differential in (M — 1) 
variables, and one therefore does not expect it to be readily solved. The stationary 
solution has long been known [38] to be 



P*(x u x 2 ,...,x M )=T 2^> Q n^TTT (86) 

in which T(u) is the usual Gamma function. In this steady state, the probability current 
vanishes everywhere, i.e., detailed balance is satisfied. For a general set of mutation 
rates, i.e., one for which the mutation rates depend on both initial and final allele 
states, the stationary solution of the Fokker-Planck equation has not been found. One 
therefore speculates that detailed balance is not satisfied for these more general models, 
which may be related to the fact that one cannot represent these mutation rates in 
models where birth/death and mutation are independent. 

Meanwhile, a remarkable feature of the Fokker-Planck equation under the 
restriction W a/3 = U a is that the full time dependence can be found, not just the steady 
state. There are at least two ways to achieve this; either one can explicitly construct 
the set of polynomials orthogonal to the weight function given by (156]) to obtain the 
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eigenfunctions of the operator appearing in (1781) [44]. Alternatively, one can make the 
change of variables 

= ^ , a = l,...,M-l, (87) 

1 — l^/3<a X f3 

under which the equation becomes separable [45]. Essentially this is possible because 
of the simple way that the M-allele problem is "nested" in the (M + l)-allele problem. 
This is also responsible for most of the simplifications in the calculations of many other 
quantities of interest, such as mean times to fixation, the probability of a particular 
sequence of extinctions, and so on [45]. In this approach, it is also learnt that the 
eigenfunctions of the Fokker-Planck equation are Jacobi polynomials. 

3.3.3. L Islands and M Alleles The final case to consider is when there are M alleles 
and L islands. The analysis is a combination of the M = 2, general L case considered 
in section 13.3.21 and the L — 1, general M case just considered. So, for example, not 
including mutation for the moment, the terms in the transition probability which do 
not involve migration are exactly as in Eq. (|69|) with indices i added to the n a or np, as 
well as an extra factor of /j(l — J2j^i9ij)- Alternatively, the transition probabilities are 
exactly as in Eqs. (1581) and (1391 with indices a and (3 added: 



P(...,m a +l,—,Tiip— l,...) (n) — fi I 1 




Here we have removed an Ap allele from island i and replaced it with an A a allele, also 
from island i (first term) or with a migrant from island j (second term). 

We assume that mutation occurs at the point that a parent is copied to make 
the new offspring, no matter whether that offspring remains on its parent's island or 
emigrates. Furthermore, the probability that a parent with allele j3 gives birth to an 
offspring with allele a does not depend on either the source or target islands, i.e., the 
mutation occurs with probability u a p as before. Including this mutation process, the 
transition probability becomes 

P(...,n ia +l,...,n i0 -l,...) (n) = fi U Pocilk) {jj- 

+ EfwMrkj)(jf) ■ (89) 

As before, a ^ (3 and it is understood that uim should be replaced by 1 — YlyLi n i-y 
The transition rates follow as before. 

Starting from the master equation (T40l) it is straightforward to obtain the Fokker- 
Planck equation using these transition rates, the algebraic steps simply being a 
combination of those which need to be carried out for the cases considered in Sections 
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13.3.11 and 13.3.21 The details are given in the Appendix, where it is shown that the 
resulting Fokker-Planck equation is given by 

d 



pjTD L M-l 

9t a — -I „ — -, dXi a 




X 



P] 



-y Li m — i ivi — i q2 
^ i=l a =l p=l ax ioOX 



)P 



i/3 



(90) 



where A a (x) and T> a/3 (x) are given by Eqs. (1791) and (180|) respectively. 

An equation of a very similar form to Eq. (1901) has been obtained in connection with 
the model of language evolution previously discussed [T7j. In that case the equation was 
derived through purely mesoscopic reasoning — there were no "islands" of iV tokens, only 
fractions, x, of different variants in a speaker's grammar. In this case in order to derive 
the Fokker-Planck equation, the mutation and migration rates also had to be scaled, 
now not by N, but by A, the weight given to new tokens. The generational time had also 
to be scaled, this time by A~ 2 . Thus the scalings required to derive the Fokker-Planck 
equation are identical in both cases. 



4. Backward-time formulation: coalescent theory 

In Section [2] we described the Wright-Fisher model as a forward-time process, i.e., one 
specifies the probability of having a certain number of alleles in a subsequent generation 
given the number present in the current generation. It is perfectly legitimate instead to 
ask for the probability that a certain number of individuals in the current generation are 
all descended from a set of individuals from the previous generation. As we describe in 
this section, one can reconstruct the ancestry of the present-day population given that 
it has evolved under the dynamics defined in Section [2j We shall also show below that 
this can be used to recover any desired property of the population going forward in time 
from a known initial condition. Thus these forward- and backward-time approaches are 
entirely equivalent. 



4-1. Reconstructing the ancestry 

To describe the backward-time approach in detail, we return to the ideal Wright-Fisher 
population of N haploid individuals described in Section [2j Each of these individuals 
must have a parent from the previous generation; some of them may share a parent, 
in which case there are individuals in the previous generation that did not have any 
offspring. Hence, as one looks back in time the number of ancestors of the present-day 
population must decrease until a single ancestor of the entire population is found. 

Let us assume that at t generations in the past (note that in this section, increasing 
t means going back further in time) there are n ancestors of the present-day population. 
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Figure 2. Graphical representation of the coalescent process. Each vertical line 
corresponds to a lineage leading to the final (present-day) population. Two lineages 
merge whenever the ancestors they correspond to find a common parent. Since this 
coalescent rate is proportional to n(n— 1), where n is the number of lineages remaining 
and is much smaller than the total population size N, the expected time between 
coalescence events considerably lengthens as one goes back in time. 

The probability that any pair of these ancestors have the same parent is 1/N. This can 
be understood from the fact that when going forward in time, a new generation can be 
constructed by assigning to each of the N offspring a parent chosen at random from one 
of the N individuals in the previous generation. The probability that out of any pair of 
offspring, the second receives the same parent as the first is then clearly 1/N as claimed. 
If the number of ancestors is much smaller than the total population size, n <C N, the 
probability that three or more individuals are siblings, or that that there is more than 
one pair of siblings among the subpopulation of n ancestors, is of order 1/N 2 and can 
thus be neglected in the limit N — > oo. As one steps back a single generation, therefore, 
the number of ancestors decreases by one with probability 

*<») = Q*. ("J 

since there are (™) pairs that could find a common parent; otherwise the number of 
ancestors stays the same. 

It is customary to represent this decrease in the number of ancestors in the form of 
a tree with the vertical direction representing time (going upwards corresponds to going 
backward in time). Each branch represents an ancestor of the present day population 
and two branches coalesce when the corresponding ancestors find a common parent- 
see Fig. |2j For this reason this backward-time process was called "the coalescent" 
when formalised mathematically by Kingman in the 1980's [16], although the underlying 
ideas had previously been employed by population geneticists prior to this point (some 
discussion of this earlier work can be found in [47J). 

Another way to picture this process, which may be more appealing for physicists, 
IS cLS cL particle reaction system. Let there be a lattice (or, to be more technically 
correct, a graph) with N sites (nodes), n of which are occupied by a particle. In each 
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Figure 3. Interpretation of the ancestral dynamics as a reaction-diffusion process on 
the complete graph. Each particle represents a lineage, and each particle hops to a 
neighbour, or stays where it is, with equal probability in each time step. Should two 
particles ever coincide on the same site, they react immediately, leaving only a single 
particle. 

timestep, each particle either remains where it is, or hops with some probability to one 
of its neighbours. Should two particles ever occupy the same site simultaneously, they 
immediately coalesce — see Fig. [3j One can verify that the backward-time formulation 
of the ideal Wright-Fisher model corresponds to this reaction-diffusion process on the 
complete graph (i.e., a network in which each site is connected to every other — also 
sometimes called a mean- field geometry). 

From p c one can calculate statistics of the length of time that elapses between a 
state with n ancestors to a state with m < n ancestors (as long as both n and m are 
much smaller than N). This is because p c is constant until a coalescence occurs, and 
hence the time spent in a state with n ancestors has a geometric distribution 

P n {t) = [l-p c {n)] t - 1 p c {n) . (92) 

The mean of this distribution is thus l/p c (n), and since successive coalescence events 
are uncorrelated, the time from n to m ancestors is 

T( - m) ^l +1 ^^ 2 ^l +1 ^T)^ 2A^ (^^)• (93) 

This result shows that the ancestry of a present-day population is dominated by an era in 
which the number of ancestors is small: the mean time in which there are two ancestors 
T(2 — > 1) is more than half the time Tim — > 1) to a single common ancestor from any 
number m N ancestors. In fact, it can further be shown that if one starts with a 
sample comprising the entire population m = N, in the limit of an infinite population 
iV — > oo, after any finite rescaled time r = t/N, the number of ancestors remaining is 
finite (i.e., a vanishing fraction of the initial population) [18]. This is significant because 
it means that the condition for only pairwise coalescences with nonzero probability, i.e., 
that n <C N, is satisfied at any finite rescaled time r in the past. 

This backward-time description of Wright-Fisher population dynamics has a 
number of practical benefits. Firstly, the data that geneticists have to hand comes 
from some present-day sample of individuals, and thus it is sensible to condition the 
outcome of the evolutionary dynamics on this known outcome. Indeed, geneticists are 
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Figure 4. Lines of ascent /descent connecting two sets of individuals, one in the initial 
condition and one in the final condition. Note here that shading does not refer to the 
allelic state of any individual, rather whether it is considered to be within or outside 
the set of interest at the early or late time. The heavy lines show the lines of descent 
from the early-time set that contribute to the late-time set. 

often interested in the information obtained by reconstructing the ancestry of sampled 
genes, and a number of sophisticated methods based on ensembles of ancestral trees 
have been developed with this aim in mind [H]. Secondly, it is very much more efficient 
to simulate the backward-time process: to construct a genealogical tree for n present- 
day individuals with the desired rate involves generating n — 1 waiting times from the 
appropriate geometric distributions (I92I) . Furthermore, having constructed this tree, 
one can superimpose various of models of mutation (such as that described above) 
since selectively neutral mutations (i.e., those that do not cause individuals carrying 
a particular allele to have more offspring than others) by definition do not affect the 
population dynamics. Finally, as we will discuss in Section [5] below, the backward-time 
method generalises more readily to non-ideally mating populations (although one is 
restricted to selectively neutral alleles). 

4-2. Relationship between forward- and backward time dynamics 

Clearly, it should be possible to use the backward-time formulation to calculate 
properties of the forward-time evolution from a fixed initial condition. The way to 
do this is to note that the process of assigning mutant alleles to individuals in the 
initial condition is an example of a (one-off) mutation process that has no effect on 
the statistics of the line of descent. Consider now Fig. HI It shows the lines of descent 
from some initial condition: we can take the individuals shown shaded at the earlier 
time (top) to be mutants, and the remainder to carry the wild-type allele. At the later 
time, we can select some other set of individuals (e.g., that formed by the dark-shaded 
individuals) and ask for the probability that all are mutants. 

Clearly for them all to be mutants, they must all be descended from the mutants 
in the initial condition (if no mutation events take place in the interim). However, there 
may also be other mutants present at the later time outside the set of interest (there are 
two such mutants in Fig. HJ). Therefore, to construct the probability R(a, d; t) that all d 



Stochastic Models of Evolution in Genetics, Ecology and Linguistics 



32 



sampled individuals at late time are descended from one, more or all of the a mutants 
at the initial time from P(b\a;t), the probability that there are precisely b mutants t 
generations after the initial condition, one needs to sum the latter, weighted by the 
probability that the chosen set falls within the late-time mutant set. Given that there 
are b mutants in the population, there are Qj ways to choose d individuals in such a way 

that all are mutants. Since there are in total f^J possible ways in which d individuals 
could have been chosen, we have that 



R(a,d;t)=Y,7W\P(b\a;t) . (94) 



b 



We can ask the same question using the backward-time language. Let Q(c\d;t) 
be the probability that the late-time chosen set of d individuals has coalesced into c 
ancestors t generations previously. For all d late-time individuals all to be mutants, 
their c ancestors must all be drawn from the initial mutant set. Using the same kind of 
counting argument as previously one finds that 

R(a,d;t) = J2jW\Q(c\d;t) . (95) 



Equating these two expressions gives an implicit duality relation between the forward- 
and backward-time distributions P and Q; this can be found for example in [50]. Using 
the identity 

one can write this duality relation to give P explicitly in terms of Q [21]. One finds 

a N ( N )( N ~ b )( a ) 

P(b\a; t) = E U-V b+ AbJ f N b J [cJ Q(c\d-, t) . (97) 

c=ld=b [ c ) 

Since it is usually the case that backward-time quantities are more easily obtained 
than their forward-time counterparts (e.g., by Monte Carlo sampling of the ancestral 
dynamics, or the equivalent react ion- diffusion process), this formula provides one means 
for efficient computation of the evolution from a fixed initial condition. For example, if 
one wants to know the probability that a mutant allele has become fixed by time t, one 
puts b = N in the foregoing to find 

P(N\a;t)=Y l f^Q(c\N;t). (98) 

c=1 UJ 

Actually, in this case Q(c\N;t) is known analytically [IE1 HZ], as indeed is P(N\a;t), 
at least in the limit N —>■ oo and after rescaling allele frequency Xq = a/N and time 
t = t/N [23]. Nevertheless, this approach has recently found utility in applications 
to fixation times subdivided populations in which neither of these distributions can be 
calculated exactly (see [21] and Section [5] below). 
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4-3. Scaling with population size in the coalescent formulation 

In Section [3] we saw that the characteristic timescale of evolution grew linearly with 
population size for the Wright-Fisher model, and quadratically in the Moran model. 
One sees these same scaling relations in the coalescent formulation. Recall that above 
we argued that the probability that two lineages coalesce in one generation undergoing 
Wright-Fisher evolution is p c (2) = -k. Under the Moran model, the probability that a 
pair of lineages coalesces is equal to the probability that one of them was chosen to be 
the parent individual, and the other the offspring. This is p c {2) = or p c (2) = N ^_^ 
if the sampling is done with or without replacement, respectively. The factor of 2 
appears here because there are two ways in which a (parent, offspring) permutation can 
be assigned to a pair of individuals. 

In both cases, the total rate of coalescence if there are n lineages present is 
p c (n) = (^jp c (2) as long as n < iV. Therefore in the limit iV — > oo the ancestries 
of both Wright-Fisher and Moran populations have the same dynamics as long as time 
as rescaled as r = t/N in the former and r = 2t/N 2 in the latter. This is precisely the 
scaling that was previously used to obtain the same Fokker-Planck equation for the two 
models. 

In fact, very many neutral population dynamical processes have the same ancestral 
dynamics as the Wright-Fisher and Moran models in the limit N —>■ oo under the 
appropriate rescaling of time. For example, if one has a model in which the distribution 
of offspring number for each individual is invariant under exchanges of the individuals, 
and one can further define a timescale on which the probability of ternary coalescences 
is vanishingly small compared to binary coalescences, then (subject to other rather 
formal requirements), the ancestral dynamics converges in the limit iV — > oo to that of 
the Wright-Fisher process (see, e.g., [51] for the full mathematical details). Population 
geneticists refer to this property as the robustness of the coalescent; physicists will 
recognise this lack of sensitivity of the late-time behaviour to details of the dynamics in 
large populations kind of universality. 

Furthermore, one can also understand the need to rescale migration and mutation 
rates with the population size N from the backward-time formulation. Consider the 
Moran model with two alleles as defined in Section [2731 The probability that a particular 
ancestor was chosen as the offspring in the previous generation is 1/N, and further that 
the allele mutated after the parent was sampled is either u or v (depending on the allele 
in question). For coalescence and mutation probabilities to be of the same order in N, 
we therefore require u and v both to be of order 1/N. 

5. Non-ideal populations 

We have thus far mostly restricted our discussion to ideal populations, i.e., those in which 
the individual (s) displaced through the production of new offspring are chosen from a 
uniform distribution. We now turn to the contrasting case of non-ideal populations, 
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where the individuals to displace are chosen non-uniformly and investigate the various 
ways in which population subdivision has been handled by population geneticists. 

Non-ideal populations are most easily handled within the backward-time 
formulation, and therefore it is most convenient to describe the migration events in 
terms of the set of probabilities /i^ that any given individual randomly sampled from 
subpopulation (or deme) i is the offspring of a parent individual from the previous 
generation in deme j. For the case i ^ j, this is given simply as the joint probability 
for a parent to be in deme j and an offspring in deme i, which in our earlier notation is 

/'</ = fj9ij ■ (99) 
On the other hand, the probability that an individual in deme % is not an immigrant is 
obtained from the fact that J2j Hij — 1? v i z > 

Vii = 1 - J2 fj9ij • ( 10 °) 

It is worth at this stage to explore these migration parameters a little further. 
First, if subpopulation % comprises iVj individuals, the expected number of immigrants 
per generation is 

Nt^N^Vij. (101) 

Meanwhile, the expected number of emigrants from deme i (defined as the number of 
offspring of individuals that end up in a deme j ^ i) is 

iV? Ut) = £ Njfiji . (102) 

Migration is called conservative if the expected number of individuals entering and 

leaving every deme per generation is the same, i.e., if N$ = N^ out ^ Vz, or equivalently, 
if 

X • V <7'<, X • V ;7';' V * • (1° 3 ) 

Note that this corresponds with the definition of conservative migration previously given 
(|67|) when all deme sizes are equal, iV, = N. Now, let us briefly consider the common 
ancestor of a subdivided population. The single lineage that remains as t —>■ oo will 
hop from between subpopulations at a rate faj from deme i to deme j. Therefore, 
the stationary distribution of the common ancestor (assumed unique) is given by the 
solution of the set of equations 

E^ = E^- (i 04 ) 

We see that if migration is conservative, i.e., if (11031) holds, the probability that the 
common ancestor is in subpopulation % in the steady state is proportional to its size, 

« = m (105) 

where L is the number of subpopulations and iV = ~ X)t=i is their mean size. 
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Clearly, one can construct models of varying complexity and intricacy by suitable 
choices of the migration probabilities ^ and subpopulation sizes iVj, although some 
cases are still excluded from this description. Consider the following classical examples. 

Symmetric island model The simplest model of subdivision has L equal-sized 
subpopulations (islands) of equal size Ni = N from which a uniform fraction /i of the 
individuals on each island are replaced by offspring chosen at random from the other 
islands: y^j = y,/{L — 1) for all pairs % 7^ j. The version of this model that has an 
infinite number of islands was introduced by Wright [2j [52]; the finite island model is 
often attributed to Maruyama [53] and Latter [51]. 

Randomly-mating diploid population with two sexes Recall that a diploid population is 
one that carries two instances of each gene which may or may not be the same allele. A 
diploid population of L individuals can be thought of as a set of L subpopulations, each 
of size 2. One gene from in each subpopulation is sampled randomly from the L m males 
in the populations; the other from the Lf = L — L m females. To represent this model 
via the migration parameters /^y, however, it is necessary to have 2L subpopulations 
each of size 1. We then take //jj = jj— if subpopulation i is a gene that is inherited 
from a male and j is one of the 2L m genes carried by a male in the previous generation; 
and Hij = is the corresponding quantity for females. 

Randomly-mating asexual diploid population with hermaphrodism As an alternative to 
sexual reproduction, the two genes carried by a diploid individual could be inherited 
from the same parent (through a process called selfing) with probability S, or from 
two different parents with probability 1 — S (by outcrossing). This model cannot be 
reproduced using the fiij parameters because the parental distribution is not independent 
for the two genes carried by a single individual. 

It is also possible for populations to be subdivided by age group, for example, if 
only individuals within some age class are fertile. Given that considerable complexity 
is possible within even the simple general model of population subdivision outlined 
here, it is unsurprising that geneticists have sought a small number of parameters with 
which to characterise and differentiate the various specific cases that can be constructed. 
Two quantities that appear prominently throughout the population genetics literature 
are inbreeding coefficients and effective population size. Both of these concepts are 
commonly said to have been introduced by Wright |55j . 

5.1. Inbreeding coefficients, fixation indices and F -statistics 

Geneticists are often most interested in diploid organisms, and there an important 
question is whether the two instances of a gene at a specified locus are the same or 
not. In the absence of mutation, the two genes are the same only if they share a 
common ancestor (a situation described as being identical by descent) . Since this is the 
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case if some inbreeding has occurred, the probability F that two genes are identical by 
descent is sometimes called an inbreeding coefficient. 

Given a sample of some individuals taken from a population, F can be estimated by 
examining specific genetic markers. For example, one can look for different enzymes that 
can catalyse a particular chemical reaction, or look directly at DNA sequences (see e.g., 
[56J for a description of various methods in the context of subdivided populations). 
Of particular interest is the way in which population structure manifests itself in 
observed identity probabilities: typically, one expects there to be a greater chance 
for genes sampled from a subpopulation to be identical, than those from the wider 
(meta)population. 

A quantity that encodes this information and that has become ubiquitous in 
analyses of population structure is denoted Fst, variously called an inbreeding 
coefficient, fixation index or F-statistic, and was introduced by Wright [55]. A number of 
subtly different, sometimes ambiguous, definitions of this quantity can be found dotted 
around the literature — one senses a slight air of frustration arising from this state of 
affairs in the introduction of [57J for example. Following the approach of [58] (and 
others), we avoid getting too heavily into the distinctions between different expressions 
for Fst, and simply adopt a particular definition, namely that provided by Nei [59] 
which reads 

Fst = • (106) 

Here Fq is the probability that two genes drawn at random from the same subpopulation 
are identical by descent, and F is the corresponding quantity for a pair of genes sampled 
randomly from the entire population. Nevertheless this definition leaves open the 
question of whether these samples should be weighted according to the subpopulation 
sizes or not; in practice, this depends on how Fst has been determined from an actual 
sample at hand. 

Despite its ubiquity and utility, there are various aspects of this quantity that are 
confusing at a first encounter. First there is the (rarely elucidated) notation: S and 
T simply stand for subpopulation and total population respectively. With reference to 
diploid populations, one can also define analogous quantities Fjs and Fit which measure 
how much more (or less) closely related the pair of genes contained within an individual 
are than a pair sampled from the population at large. 

Next, one often finds a single expression of Fst quoted for a given model of 
subpopulation, even though identity probabilities in general change with time. Wright's 
infinite island model (defined above) provides one example of a case where Fst is time 
independent. Since the total population size is infinite, the number of individuals across 
the whole population that carry a particular allele does not change with time. However, 
if one takes a subsample of this population of size N, there will be variation in the number 
of alleles of a particular type, and so one finds F ^ F which leads to a nontrivial Fst- 
Wright [52J obtained an expression for Fst by taking the case of two alleles with a 
fraction X{ of the individuals in subpopulation i carrying one of them (say, the wild 
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type). In such a case, one can show that 

f o = )EN + ( 1 - x «)1 = 2^-2x + l (107) 

F = Ta" E [ x &j + (! - - = 2x 2 - 2x + 1 (108) 



where f(x) = ^Ei/(^i)> Hence, 

^5T = "~7Tj ~T • (109) 

x(l — X) 

We remark that in the limit of infinite subpopulations (where there is no distinction 
between sampling with and without replacement), one has < Fst < 1, the lower 
and upper bounds relating to extremes of homogeneous and heterogeneous spatial 
distributions of alleles respectively. These bounds one can see by noting first that the 
numerator of ( 11091) is positive, since it is a variance, and second that since all < Xi < 1 
it follows that < x < 1 and further that x 2 < x. When subpopulations are finite, it 
is possible for Fst to be slightly negative as a consequence of the effects of sampling 
without replacement: having removed one gene from a subpopulation, the probability 
of finding an identical gene in another subpopulation could be of order 1/N more likely 
than in the same subpopulation. Therefore, if one uses (I109p as a general definition 
of Fst rather than (I106p . as is done for example in [5?] , the results obtained will not 
necessarily agree in situations where subpopulations are finite. 

Using (11091) . Wright [52] showed by an explicit calculation of the variance in the 
allele frequencies that for the island model 

Fqt = ^ ~ ^ . (110) 

A quick way to obtain this result is to focus on a single subpopulation and assume 
that any immigrants that arrive from the wider population do not have a common 
ancestor (and are hence unrelated). This assumption can be justified by switching to 
the coalescent picture outlined in Section HJ suitably extended to cater for population 
subdivision. If two lineages are situated within a single subpopulation, they coalesce 
at a rate 1/N, whereas two lineages in different subpopulations will coalesce at a 
rate proportional to fi/L. If the subpopulation size N is held fixed whilst L — > oo, 
a separation of timescales occurs and the probability that two lineages in different 
subpopulations coalesce effectively vanishes. Hence, F is effectively zero (since the 
chance of two randomly-chosen individuals are from the same subpopulation is also of 
order 1/L), and F S t = F . Using now the fact that F does not change with time, one 
can establish that 



(in) 



since the right-hand side of this equation gives the value of F after one generation 
of the Wright-Fisher process (as defined in Section [2]). This comes about because the 
probability that the members of any pair are both descended from individuals from 
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the subpopulation of interest is (1 — yu) 2 ; the probability that both are offspring of the 
same parent is 1/iV; and if they are offspring of different parents, those parents were 
themselves related with probability Fq. Rearranging this equation yields ( 11 101) . Note 
that within this picture, F$t can be interpreted as the probability that two lineages in 
the same subpopulation coalesce before one of them migrates away (looking backward 
in time). 

It is worth looking more closely at how F$t behaves in the various limits that have 
previously been established. First, if the migration probability \x remains finite as the 
subpopulation size iV — > oo, F S t — ► which indicates an absence of spatial structure 
in gene frequencies — precisely what one would expect in the strong migration limit. On 
the other hand, if the combination Nfj, is finite as N — > oo, one has 



to leading order in 1/N. As a consequence of this relationship, a value of Fst 
estimated by sampling from a population is often used as a means to obtain the mean 
number of migrants fj,N arriving in a subpopulation in each generation, this being 
considered an easier task practically than tracking the movement of individuals between 
subpopulations. We also remark that (11101) is not restricted to the special case where the 
migration rate between every pair of islands is the same, since it has only been assumed 
that F is stationary, and that all immigrants into the subpopulation of interest are 
unrelated. In practice this means one requires an infinite total population that is at 
equilibrium, or has stationary mean allele frequencies as a consequence of conservative 
migration. Nevertheless, these are still quite restrictive assumptions and it turns out 
that estimating /j,N in this indirect way is fraught with difficulties [60] . 

There is therefore much interest in exploring how Fst behaves under more general 
conditions, and in particular under models of population subdivision in which migration 
is not conservative, or the number of subpopulations is finite. In these models, allele 
frequencies vary with time, and typically therefore the focus is on the value of Fst 
that is approached as equilibrium is reached. In a model with a finite total population, 
the equilibrium state is one of fixation, as previously discussed in Section [2j Hence all 
identity probabilities tend to unity. However, the ratio that appears in fl 1 6 j) approaches 
a nontrivial finite value, and thus contains some information about population structure 
that can be related to gene diversity. 

A popular way to calculate this limiting value is to exploit a trick due to Slatkin [61] . 
Let us trace the ancestry of a pair of genes sampled from the present day population. 
From the discussion of Section H] we know that eventually a common ancestor of these 
genes are found; let us suppose that this happens at time t with probability c(t). Let us 
also introduce a Poisson mutation process that occurs with probability 9 per generation. 
Then, the probability that the two genes are identical in state (i.e., share a common 
ancestor and have not been subject to any mutations since the time of coalescence) is 



OO 




(113) 
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In the limit 6 — > 0, the concepts of identity by descent and identity in state become 
equivalent, and so asymptotically, 

F OT = Um f °W-ff fl) =l-£ (114) 
bl 8^o i-F(6) T v ' 

where To and T are the mean times for two genes randomly sampled from within or 
between subpopulations respectively to find a common ancestor. In essence, therefore, 
the task is to calculate the mean first passage time for a pair of random walkers which 
hop between sites of a network at rates controlled by the migration parameters /ly. 
This is achieved by solving the set of linear equations 

Tij = 1 + ^2fJ-ikVjiT H (l - -rjrSkj) (115) 
u v 7V fe 1 

where is the mean time for two lineages, one in subpopulation i and one in 

subpopulation j to coalesce. This equation arises because if one has two distinct lineages, 

one needs to wait at least one generation for a change of state to occur. The joint 

probability that the lineages in % and j hop to k and i is /x^/i^. If they arrive in 

different subpopulations, we must then further wait a mean time Tj^ for coalescence to 

occur. On the other hand, if they land in the same place k, there is a probability 1/N k for 

coalescence to occur immediately (i.e., a mean waiting time of zero), otherwise we need 

to wait T kk on average. As with F$t, the question arises as to how to form the averages 

T and T appearing in (j!14p . For example, one could take T = Tu for a specific 

subpopulation of interest. Alternatively, one could average over all subpopulations, 

possibly weighted according to their size. It is not uncommon to see both size-weighted 

and unweighted versions of T in the literature, presumably because the correct form 

depends on whether subpopulation sizes are known experimentally or not (some further 

discussion is given in |58j). 

For arbitrary migration rates ( 11151) can be rather cumbersome to solve, even for 

simple models of population subdivision. Simplifications occur in the limit of slow 

migration, where /iy = m^/iV where N is the mean subpopulation size and is taken to 

infinity. Recall from Section [2] that this is the regime in which both drift and migration 

operate on the same timescale in the limit of an infinite population size; meanwhile, in 

the backward-time formulation, this scaling has migration events occurring on the same 

timescale as coalescence (see, e.g., [62]). In this limit, terms which have both lineages 

hopping simultaneously do not enter into (11151) . Furthermore, it has been shown [63] 

that, if migration is conservative, the mean time for coalescence of lineages located in the 

same subpopulation i is independent of i and given by T# = LN. This fact considerably 

simplifies calculation of Fst for conservative models. For example, Slatkin |61j showed 

that for the symmetric island model with a finite number L of islands undergoing a slow 

migration process, 



1 1 
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which agrees with the result previously obtained by other methods (such as that used 
by [M]; see also [58] ). We observe the the finite-size corrections to the infinite island 
result, (IllOp . are of order 1/L which perhaps sheds some light on the ubiquity of the 
expression (11 101) in analyses of population structure. 

This procedure for calculating the fixation index F$t has been followed for a range 
of models of population subdivision, such as a hierarchical island model [64J (which 
has one finite island model nested inside another), a model with two subpopulations 
of different sizes (58] and a continental-island model, where migration does not take 
place directly between the islands but only via a central continent [58J. These models 
are tractable because of some symmetry or regularity in the migration structure that 
can be exploited. It is also possible to make progress in cases where the number of 
subpopulations tend to infinity [65l l66l [67] . However, we will not discuss these models 
further here. 

5.2. Effective population size 

As we described in the previous section, the motivation behind studying F$t is that it is a 
statistic of a population that can be estimated by examining appropriate data. A second 
characteristic of biological populations that has also received considerable attention is its 
effective size, N e . Like Fst, this notion can also be attributed to Wright [2], although 
it is also understood that Crow and coworkers also made significant contributions to 
its development (see, e.g., the classic text [23]). The basic idea here is that an ideal 
population is entirely characterised by the number of individuals N that are randomly 
mating. In a real population, one suspects that fluctuations (like those associated with 
genetic drift) might be due to some special "breeding" individuals that form a subset 
of the full population. 

There are as many ways to define N e as there are properties of a population that one 
might be interested in; furthermore, these different definitions will not generally give the 
same value for a given model of population subdivision. Nevertheless, for the concept 
to be useful, one hopes that the effective size determined by examining one quantity is 
indicative of how the population as a whole will behave. It is not appropriate here to 
get too involved in all the various definitions and their pros and cons; instead we will 
highlight two particularly prominent definitions and give a flavour of how they depend 
on certain aspects of the population structure. For more extension discussions we refer 
the reader to [681 EH] • 

One of the most natural definitions of effective size arises from the forward-time 
prescriptions of the population dynamics given in Section [2j Recall that under the 
Wright-Fisher dynamics, if the allele frequency in one generation is known to be x', 
in the next it is a random variable x that has variance x'(l — x')/N. This connection 
between the variance of an allele frequency and population size can be exploited to define 
an effective population size in a more general context. Let x! be some measure of an allele 
frequency across a subdivided population, e.g., the arithmetic mean x! = \ J2i=i x 'i, an d 



Stochastic Models of Evolution in Genetics, Ecology and Linguistics 



41 



Var(a;) the variance in this quantity after one generation of the underlying population 
dynamics. An effective size N e of this subdivided population can then be given as 

x'(l-x) . , 

e Var(x) • 1 ' 

To distinguish it from other effective sizes than can be defined, this one is called the 
variance effective size. Note that in this expression the variance is over the distribution 
of allele frequencies x generated by the population dynamics, and not the variance 
over subpopulations appearing in ( 1109D . A drawback of this definition of N e is that 
it is likely to depend on the frequency x, and possibly other statistical quantities- 
such as the variance of allele frequencies over subpopulations. Therefore, if one seeks 
a description of the dynamics that is closed in a small set of random variables, it will 
typically have to be an approximate description (see, e.g., [T0 | [7TI 172] ). 

A definition of effective population size that avoids these difficulties makes use of 
the fact that when tracing lineages of an ideally mating population backward in time, 
pairs coalesce at a rate 1/N. The mean rate of coalescence between pairs of lineages 
in a subdivided population can then be taken to define the reciprocal of an effective 
population size. As with the fixation index F$t, this rate will typically vary with time; in 
those cases, one can adopt the coalescence rate that is approached as one looks infinitely 
far back in time. The effective size that results is then often referred to as an asymptotic 
effective size. In fact, the same limit has been shown to be approached by a range of 
different definitions of effective size [T3| [69] . This fact can traced to all definitions being 
governed asymptotically by the decay rate of the longest-lived nonstationary eigenstate 
of the underlying population dynamics. Furthermore, this feature makes the asymptotic 
effective population size an appealing quantity to devote some effort to calculating. 

As with Fst, there are a few cases where the mean coalescence rate does not 
change with time and can be easily be calculated exactly. The first two such cases were 
introduced at the start of this section. 



Randomly-mating diploid population with two sexes If there are two distinct lineages in 
a diploid population, they must both be inherited from the same parent (and furthermore 
be the same gene from that parent) if they are to coalesce — see Fig. [5j Taking two genes 
from the population at random (with replacement for simplicity), the probability that 
both were inherited from a male, or both from a female, is ~. Within these classes, 
the same gene is chosen with probability an< ^ 2Z7 ^ or ma ^ e an d female parents 
respectively. Hence one finds a total coalescence probability per generation of 

1 ( 1 , 1 \ _ L m + L f _ 1 

A\2L m 2L f J 8L m L f N e ' y } 

where we have identified the coalescence rate with an inverse population size as described 
above. This result was first given by Wright j2]. Notice that N e < 2{L m + Lf), that is, 
if there is any imbalance in the number of males and females, the effective number of 
gene instances in the population is less than their actual number. 
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Figure 5. Sexually reproducing diploid population. Each individual carries one gene 
inherited from a female (shown solid) and one from a female (shown shaded). Two 
ways in which two distinct genes may coalesce are shown. In both cases, the two 
offspring must have been inherited from an individual of the same sex. 



Randomly-mating asexual diploid population with hermaphrodism In this model, the 
effective size can be calculated as a consequence of a separation of timescales. A 
classical calculation can be found in [71] ; here we follow the coalescent-based approach 
introduced in [75], also outlined in [62]. As we have already remarked, this model is 
subtly different to that described in Section in that the parents of two lineages are 
not chosen independently. Instead, should two lineages find themselves within the same 
individual there is a selfing probability S that both are descended from genes contained 
within a single parent individual, and 1 — 5* that they are each descended from different 
individuals. At the time of arrival into a single individual, these two lineages have a 
probability 1/2 of immediately coalescing; otherwise, going back a further generation 
there is a probability S/2 that they are both still in the same individual, uncoalesced, 
or S/2 of coalescing, or 1 — S for the two lineages to return to a state at which they are 
located in different individuals. Using the properties of the geometric distribution, it 
can be shown that two lineages arriving at the same individual have a total coalescence 
probability of 1/(2 — S), and the mean time to do so is 5/(2 — S); on the other hand, 
escape occurs with probability (l — S)/(2 — S) and takes on average 2/(2 — S) generations 
[75| [62j. Meanwhile, when two lineages are in different individuals, the mean time for 
them to arrive in the same individual is L, the number of individuals (since the arrival 
probability per generation is 1/L). Thus, once L is taken sufficiently large at any fixed 
S, the rate of coalescence upon arrival is very much faster than the rate of arrival — see 
Fig. [61 Then, one can approximate the coalescence rate as the product of the arrival 
rate and the subsequent probability of coalescence, i.e., 

w. = m=s)- (119) 

Again, the effective (haploid) population size N e = L(2 — S) is less than twice the actual 
population size, unless there is no selfing (S = 0), when the effective and actual sizes 
are equal. 
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Figure 6. Separation of timescales in a hermaphrodite diploid population. The typical 
timescale T m j g for two lineages to migrate into the same individual is very much longer 
than the typical time to coalescence r coal due to selfing, or to escape to different 
individuals t csc due to outcrossing. 



Population subdivision in the strong migration limit The coupling of a large ratio of 
coalescence to migration rates and small subpopulation sizes is crucial in admitting 
the computation of effective size of a hermaphrodite population. In the opposite 
limit, where migration is very fast compared to coalescence and subpopulation sizes 
are large, a simple, expressions valid for general models of population subdivision can 
also be obtained. The key is to realise that between coalescence events, all lineages 
are independently distributed according to the stationary distribution generated by the 
backward-time migration dynamics. That is, we assume that the probability for two 
lineages to be present in subpopulation i is (Q*) 2 where Q* is the stationary distribution 
for the single common ancestor of the whole population given by the migration balance 
equations ( 11041) . Given that the rate of coalescence for two lineages in subpopulation i 
is 1/N it then follows that under this approximation that the mean coalescence rate is 

J_ = yiQj? (12Q) 
N e ti N i 

This formula has been shown to exact (to leading order in 1/N) if the migration 
probabilities (i^ obey 

Jim Nfi i:j = oo (121) 

N—*oo 

where N is the mean subpopulation size [76]. This equation defines a strong migration 
limit, within which it is seen that the effective population size is always less than or equal 
to the total population size LN . Equality is reached only if the stationary probabilities 
Q* are in proportion to the subpopulation size, i.e., if Q* = Ni/(LN). If the stationary 
distribution is unique, this is the case only if migration is conservative. Since ancestral 
lineages are well-mixed, one does not see any variation between subpopulations, and 
so the approximation of the entire population as an ideal population with a possibly 
reduced effective size would appear to be a good one. 



For more general models of subdivision, where migration probabilities are typically 
of order 1/N, fewer exact results are available. Nevertheless, a formula for the effective 
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population size in terms of the asymptotic changes in identity probabilities has been 
given (69] as 

JL = lim E£.^wn'(i-^w) (122) 

where Fij(t) is the probability that two individuals, one sampled from subpopulation i 
and one from subpopulation j, are identical by descent after t generations, as defined 
in the previous subsection. 

This formula takes a particularly simple form in models when migration is 
conservative. Then, as shown at the start of this section, the probability Q* that the 
single remaining ancestor is in deme i in the steady state is proportional to iVj, the size 
of deme i. Then, reduces to 

■1=-LI_5> (123) 
N e LN 1 - F V ' 

where F and F here are deme-size-weighted probabilities of identity for pairs of genes 
sampled from within demes and from the whole population respectively, that is 

1 L N- 

Ji=lE^ (124) 

*=pEX$>«- (125) 

Defining a size- weighted Fst through (I106P and the formula? for Fq and F given here, 
we find that for conservative models N e and Fst can be related through 

LN , , 

Ne = z =r- • (126) 

When the assumption of conservative migration is relaxed, one does not 
find a simple relation between effective population size and inbreeding coefficients. 
Furthermore, it is not clear that a single parameter (such as N e of Fst) should 
satisfactorily characterise all aspects of the evolutionary dynamics. For example, when 
migration is weak, it has been argued that the ancestral dynamics can only be fully 
expressed in terms of takes "structured coalescent" [77] that takes into account the 
numbers of lineages present in a particular subpopulation at finite times rather than 
the weighted late-time average implied by ( 112211 [781 179] . In such cases, it is therefore 
necessary to examine specific models. For example, the variation of fixation times in 
structured populations undergoing weak migration has been investigated in an exact 
approached based on the duality relation given in Section @] for an ideal population 
[2T] . This allows for a comparison with the fixation time that would be seen in an ideal 
population with an effective size given by (11221) . In a range of models considered, it was 
found that the effective size gives an extremely good prediction for the fixation time, 
except in a particular case where special properties of the network structure admitted 
fixation to occur in a finite time in an infinite population. This finite-time effect has 
also been seen in voter models on heterogeneous (scale- free) graphs [HO, EE] , which in 
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fact is a special case of the general model of population subdivision outlined here with 
all subpopulation sizes Ni — 1. 

6. Selection 

So far we have considered only neutral models of evolution, that is, those for which there 
is no preference for a particular allele. Despite being apparently a reasonable model 
for some aspects of genetic, ecological or linguistic behaviour (as we have previously 
discussed) geneticists in particular have been interested in the fate of alleles that are 
selected for or against. 

The relationship between the genetic make-up of an individual and its survival is 
of course very complicated. However, one can explore the effects of selection by simply 
introducing parameters that determine how many offspring an individual carrying a 
particular allele (or combination of alleles when diploid organisms are being considered) 
has on average. In this section we offer a small taste of some evolutionary models that 
encompass selection. 

Let us return to the model that has a randomly-mating haploid population of N 
individuals with two alleles, denoted A and B. In the neutral model, the parent of 
each individual in an offspring generation is chosen uniformly from all possible parents. 
When selection is active, it is supposed that an individual with allele A (B) is chosen 
to be a parent with a weight wa {wb)- That is, if there are n' A alleles in the parent 
generation, each offspring has a probability 

n'uiA 

n'uiA + (N — ti')wb 

of carrying allele A (at least, if no mutations occur). Since each individual is assigned 
a parent independently, we have that the probability for there to be n A alleles in the 
next generation is 

fN\ ( n'w A 
\n) \n'wA + (N — n')wB 
1 (N 
~ {w') N \n 

where in the second line we have simplified the notation by introducing the mean fitness 
of a population with n' A alleles 

w' = 1 {n'w A + [N — n']w B ) . (129) 

Note that when the two alleles have equal fitness, i.e., wa = wb, (11281) reduces to (J5]) 
for a neutral population. 

It is usual to take a pre-existing 'wild-type' allele (we'll take this to be B) to have 
fitness wb = 1, and the 'mutant' (A) to have fitness wa = 1 + s. Then, the mean 



(N - n')w B 
n'uiA + (N — n')wB 



i \ — ii 



(127) 
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change in the number of mutants in one generation, given that there are n(t) mutants 
in generation t, is 

W f + l))-n (i )_ s ng)/ _m +0(s iy (130) 



N N \ N t 

Denoting the frequency of mutant A alleles as x, the Fokker-Planck equation can be 
shown (e.g., via a Kramers-Moyal expansion or a large- N expansion) to be 

^P(x, t) = s^x(l - x)P(x, t) + 2^|^s(l - X ) P ^ *) ( 131 ) 
when the relative fitness of the mutant s is small. In this instance, small means s C 1, 
and not relative to 1/N. Hence, if at some fixed s one has the (effective) population size 
N ^> 1/s, the effects of drift can typically be neglected. Then the mean allele frequency 
satisfies the deterministic equation 

^ = sx(l-x) (132) 
at 

and one finds a logistic growth in the number of mutants 

x(t) = — r X(0) , N1 . (133) 

W x(0) + [1 -x(0)}e- st V ; 

Typically it is assumed that the stochastic effects of drift are important only 

when one of the allele frequencies is small, and so the logistic growth is taken to be 

representative of the change in frequency of a beneficial mutation in a randomly mating 

population once its frequency has reached some threshold (see e.g., [S2])- This period 

of rapid growth is often referred to as a selective sweep. Of course, a mutant allele is 

present only in small numbers when it first appears, and here one can work out the 

probability that a mutant allele will be lost due to genetic drift. To do this one needs 

to solve the stationary backward Fokker-Planck equation corresponding to fl 1 3 1 [) 



= x(l — x) 



(134) 



subject to the boundary conditions Q(0) = and Q(l) = 1, since Q(x) is the probability 
that a mutant allele becomes fixed given that its initial frequency is x. Integrating once 
gives Q'(x) oc e~ 2Nsx and again gives Q(x) = Ae~ 2Nsx + B where A and B are to be 
fixed by the boundary conditions. This yields [83] 

i _ -2Nsx 

= l _ e -2Ns ■ ( 135 ) 

In particular, if there is initially only a single mutant, x — 1/N, in the limit of an infinite 
population one has 

Jim Q(l/N) = { 2S ° < ' « 1 (136) 

N->oo [ (J s < (J 

where we recall that to obtain (I13ip it was assumed that s is small. From the backward 
Fokker-Planck equation (11341) one can also find the mean number of generations r until 
fixation of a selectively advantageous allele. It is given approximately as 

, = ^ (137) 
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when the combination Ns is large (e.g., s some fixed value and iV — > oo) [24] . 

One may ask how selection affects the backward-time formulation of the population 
dynamics that is couched in terms of genealogies. It turns out that the complexity of the 
calculations increases considerably, because as one goes back in time one needs to keep 
track of the number of alleles of each type present in the population. One can sometimes 
find situations in which there is an equilibrium in these frequencies, for example, when a 
selectively disadvantageous allele (s < 0) is maintained in a population due to recurrent 
mutations generating it [HI]. When there is a selective sweep, it can be shown that 
the relevant genealogies are those that have multiple lineages coalescing simultaneously 
[85] — compare with the case of neutral evolution when the probability of a triple merger 
is suppressed by a factor of 1/N compared to that of a pairwise coalescence. This 
is consistent with the fact that for fixation to be achieved on a timescale sublinear 
in N ( 1137!) : recall, that in a state of fixation all N individuals share a common 
ancestor. One way to formulate the genealogical process with selection is through 
the "A-coalescent" [86] ; meanwhile, certain aspects have recently been elucidated by 
considering the properties of noisy travelling waves [87] . 

Given the discussion of Section [51 one may also be interested in determining how 
effects of selection and population subdivision combine. In a number of ways the 
situation is rather similar to the neutral case, at least if the selective advantage s (or 
disadvantage, if negative) is not spatially dependent. For example, when migration is 
strong one expects fixation probabilities and times to be given (I135P and (I137[) but with 
N being an effective population size of the order of the total population size [88J. In 
the slow migration limit, the mean time for a lineage to hop between subpopulations 
~ iV is much longer than the duration of a selective sweep ~ In N and so typically each 
subpopulation is taken to be fixed in either the wild-type or mutant state. One thus 
calculates fixation probabilities and times by having wild-type subpopulations invaded 
by their mutant neighbours (and vice versa) on the migration timescale, and a flip 
from the wild-type to mutant state occurring with the probability given by Q(l/N), 
and in the other direction with the same expression but with s replaced by — s. When 
conservative migration is in force, it can be shown that the probability a mutant allele 
fixes is independent of the location of the initial mutation [53j [88]: a similar situation 
occurred in the neutral case (although the fixation probability is different). To see 
deviations from this behaviour, one needs either to introduce additional processes (such 
as extinction and recolonisation of subpopulations [89J) or relax the assumption of 
conservative migration [90]. In the latter case one finds that the network structure 
connecting the subpopulations strongly influences whether selection or drift is the 
dominant process. For example, star-like structures amplify the effects of selection and 
can be constructed such that an advantageous mutation appearing almost anywhere is 
guaranteed to fix. On the other hand, it is also possible to contrive networks in which 
fixation occurs whenever the mutation occurs within a particular subpopulation, and 
never if it appears elsewhere. This latter type of behaviour does not, in fact, depend 
on the mutation having a selective advantage: even neutral mutations appearing in 
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"well-connected" parts of a system can fix with high probability when migration is a 
non-conservative process [21] . 

Shifting focus from networks to continuous space, one can write down an equation 
for the advance of an advantageous mutation. Recall that in the absence of drift, one has 
the deterministic equation (11321) for the mutant gene frequency i in a subpopulation. 
With isotropic migration in one dimension with a coordinate u one would augment this 
equation with a diffusion term 



d d 2 
x(u, t) = D——x(u, t) + sx{u) [1 — x(u)) 



:i38) 



dt v ' ' du 2 

This is known as the Fisher or KPP (Kolmogorov-Petrovksy-Piskunov) equation 
and admits travelling wave solutions of the form x(u, t) = f(u — vt) where v is the wave 
velocity. If one anticipates that the leading edge of the wave has an exponential decay 



/(£) = e~ , one finds a range of velocities v = DX + s/X > 2ys/D are possible. It 
turns out that in such an equation, if the interface between the stable and unstable 
phases (here, regions of high and low mutant frequencies) is sufficiently sharp, the front 
velocity selected is the smallest allowed |93j. If genetic drift is reintroduced, e.g., by 
adding a white noise to (11381) with zero mean and variance x(l — x)/N (cf. (j!3ip . but 
note the usual problems that arise from introducing multiplicative noise in such an ad- 
hoc way), one expects strong fluctuations in the leading edge of the travelling wave as 
a consequence of the possibility that a newly introduced mutant may go extinct. This 
causes a very small shift in the velocity of the front and a more diffuse profile than in 
the deterministic case [94J. 

One can also consider models with spatially varying fitnesses. A simple example 
would be if a mutant had fitness advantage +s at position u > and disadvantage — s 
at u < (s here is taken to be a positive number) [95]. At u — > oo one anticipates that 
the mutant allele would be fixed (x = 1), whilst at u —>■ — oo only the wild type would 
be found (x — 0). Using H138[) the steady state can be found by solving the nonlinear 
equation 



D h x{u) 



±sx(u)[l — x(u)] 



(139) 



where the positive sign is taken for u < and the negative sign for u > 0. The symmetry 
of the problem implies that x{u) = l — x(—u), and in particular that a;(0) = |. However, 
one finds the step in the fitness landscape at u = induces a discontinuity in the gradient 
of x(u) at that point. To see this, one must solve this differential equation which is 
achieved by multiplying both sides by and integrating twice. This procedure leads 
to [95] 



x[u I 



tanh 



tanh 



i 

2 ' 



K\ - 



u < 
u > 



where 



K 



1 ln + 

2 n lx/3- v/2, 



1.146216. 



(140) 



(141) 
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-2 



Rescaled position, x = (s/2D) u 



Figure 7. Variation of mutant allele frequency with position in a steady state where 
selection is balanced by migration. The mutant allele has a selective advantage s at 
positions u > 0, a disadvantage —s at positions u < 0. Migration is characterised by 
a diffusion constant D. Although hard to discern on the plot, there is a discontinuity 
in the gradient of the allele frequencies at u = 0. 



This function is plotted in Fig. [7J The spatial variation of allele frequencies due to a 
balance between migration and a changing fitness landscape has been called a dine [96J . 

Finally one can consider variation in fitness not in real space, but in the space 
of genotypes (i.e., possible allele combinations). Let us return to a randomly- mating 
diploid population, in which a fraction x\ of all genes are of allele Ai, i = 1, 2, . . . , n. It 
is fairly straightforward to show [23] that, in the absence of drift, the change in allele 
frequency per generation is 



and Wij is the fitness of an individual carrying the pair of alleles (Ai,Aj). One can 
therefore view the function f(xi,X2, ■ ■ ■ ,x n ) = — lnu> as a kind of free energy defined 
over the space of allele frequencies which the evolutionary dynamics seeks to minimise. 
If we now extend the discussion to multiple gene loci (i.e., genes coding for different 
traits) with interactions between them, it is possible for an extremely rugged fitness 
landscape in the space of all possible allele frequencies to emerge [97] . One can then 
find a transition between two distinct regimes induced by a change in mutation rate: 
if the total number of mutants appearing per generation across the whole population 
(which is governed by the population size N multiplied by the mutation rate u) is small, 
fixation of a selectively advantageous allele is likely to occur before the onset of the next 
mutation (see e.g, [98], [99]). Thus, all individuals in the population are then likely to 
have the same genotype, and one which tends towards a local fitness maximum. On the 
other hand, if the number of mutants appearing per generation is large, and one is likely 




(142) 



where the mean fitness of the population is 

hi 



(143) 
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to see many different genotypes in the population, each corresponding to a different 
fitness maximum. In the latter case the individuals are said to form a quasi-species 
[TOOl 198] 

7. Conclusion 

This article has been a review of the ideas and formalism used to model stochastic 
processes in fields that statistical physicists are not typically acquainted with, specifically 
population genetics, ecology and linguistics. As a consequence, some parts of the 
discussion will seem familiar, other parts will not. We have tried, and we hope that we 
have succeeded, to explain the background ideas and motivation, since this will be the 
greatest obstacle to understanding among a readership of statistical physicists. On the 
other hand the degree of mathematical sophistication that has been assumed is greater 
than would be typical outside physics or mathematical biology. This makes review quite 
different to others in the same area, and while we expect the readership to be mainly 
statistical physicists, we hope that some of those working particularly in ecology and 
linguistics will find our approach to their subject interesting and stimulating. 

In our discussions of the mathematical models, we have mostly used the language 
of population genetics, but through of the mappings discussed in Section 12.51 the results 
obtained are more widely relevant. As the evolutionary paradigm becomes even more 
widely applied, there may be other areas in which analogies can be drawn. It is 
interesting how neutral processes turn out to have greater importance in all three 
areas we discussed; at the very least neutral theories can be thought of a null models, 
against which data and other models can be compared. Most textbooks in population 
genetics begin their discussion of genetic drift with the Wright-Fisher model, although 
for physicists the use of non-overlapping generations and a "time" measured in number 
of generations will not appear so natural. The Moran model, which has exactly the 
same limit when the number of genes become large, is far more familiar, resembling a 
birth/death processes where a death is immediately followed by a birth. In addition, 
the continuous time limit may easily be taken, leading to a master equation of a kind 
well-known in statistical physics. 

We spent some time explaining the relationship between the discrete-time approach, 
based on transition probabilities, and the continuous-time master equation approach, 
based on processes occurring independently at a given rate. The use of master equations 
is apparently rare in the population genetics literature, which is one of the reasons why 
we have gone into this approach in such detail. Furthermore, this latter formalism 
usually turns out to be more efficient. For example, in his book Hubbell [Tj formulated 
his model by analogy with the corresponding discrete-time genetic models, and had 
to perform simulations to generate the stationary probability distribution. However, if 
the model is formulated as a master equation, the stationary probability distribution 
can be obtained analytically [101] . However, the relationship between the discrete- 
and continuous-time formulations is not necessarily straightforward. For example, in 
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Section [XT1 we outlined two ways in which mutation can be included in a continuous-time 
formulation. One is to have combined birth, death and mutation events, with on average 
one occurring per unit time. The other is to have mutation as a spontaneous Poisson 
process (like radioactive decay). We showed that whilst all discrete-time (Moran) 
processes with mutation could be realised using the former approach, restrictions were 
imposed on the latter. 

Another aspect of the article, which to our knowledge has not been elucidated 
elsewhere in the context of these problems, is the emphasis we have put on starting with 
individual (or gene or token) based models and deriving the corresponding mesoscopic 
model, where the limit of the number of individuals, N, tends to infinity. Simulations 
are frequently carried out using individuals, and moreover, as is well-known, an infinity 
of individual based models will give the same mesoscopic model in the large N limit. 
For example, the language evolution model which we have alluded to, was first discussed 
in a mesoscopic context [T7j, but this analysis was restricted to quite simple situations, 
for instance where all speakers spoke equally with each other. When more complex 
situations are analysed, such as allowing for a non-trivial network topology for links 
between speakers [21], the individual-based picture becomes far more useful. In 
particular, the backwards-time or coalescent approach is a far more efficient way of 
performing simulations. In some cases, such as understanding trends in how fixation 
time changes with network structure, which require simulations of quite large networks, 
the coalescent formulation is indispensable. 

In genetics, ecology or language, just as in physics, reality cannot be described by 
ideal models; there will be a multitude of ways in which real systems deviate from the 
ideal models created by scientists when they first enter a field. One of the methods 
that has been devised by population geneticists to deal with this will be very familiar 
to physicists. This is to characterise a non-ideal system by a few parameters, which 
will hopefully, if chosen correctly, capture the essence of the system. It may be that 
a simple model can then be utilised, but with these parameters built in. An example 
is the effective population size, N e , discussed in Section 15.21 which reflects how the 
non-ideal nature of the system changes the effective value of N: the effective size of the 
real population being the number of individuals in the ideal population which gives the 
same magnitude for the quantity of interest. The effective population size and inbreeding 
coefficient, discussed in Section ISTTl are extremely widely used, but unfortunately not in 
a very systematic way. We have attempted to illustrate their use, and to draw attention 
to some of the confusion which exists in the literature, but it is still the case that those 
using these useful concepts will need to be vigilant when reading the literature on them. 

This last observation opens up some questions that may be worth of further 
study. First one can ask whether inbreeding coefficients and the effective population 
size really adequately characterise the evolution of a non-ideal population. Are there 
other approximation schemes that might allow deviation from ideal behaviour to be 
described in a systematic and controlled way? Secondly, the backward-time description 
of neutral evolution will be recognised by nonequilibrium statistical physicists as the 
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react ion- diffusion process A + A — > A. Although exact solutions for this process exist 
for certain mathematically convenient geometries (such as continuous real space [102] 
and, as we have seen, the complete graph) there is interest in extending the analysis to 
heterogeneous graphs, which thus far have been treated only using approximate methods 
[801 18T1 12T] . Such solutions would be useful as they would obtain more detail about the 
evolution of the system, for example, the manner in which an innovation propagates 
through the system en route to fixation. There are many other questions which would 
benefit from a systematic study of the type that statistical physicists are well-placed 
to carry out. We hope that this article has stimulated some readers to find out more 
about these fascinating applications of stochastic methods and to contribute to the 
development of the field. 
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Appendix A. The large N expansion of the master equation for M alleles 



In this Appendix we give details of carrying out the large N expansion beginning from 
the master equations involving M > 2 alleles, considered in Sections 13.3.21 and 13.3.31 

We begin by expanding out the terms containing the transitions rates (1701) — which 
are those involving the type M allele — letting x a = n a /N, and expanding all other terms 
in powers of 1/N. This is the procedure used to obtain Eq. ( H6i) . and the analogous 
expression in this case 
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neglecting terms of order 1/A 3 and higher. Carrying out the same procedure for the 
terms containing the transition rates (15^]) gives an expression 
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again neglecting terms of order 1/iV 3 and higher. Adding Eqs. (1A.1I) and (1A.2[) gives for 
the master equation 
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up to terms of order and higher. As in previous cases, if we define a new time 

t = 2t/N 2 , and take N -> oo, then we find the result Eq. (|7T|) . 

In the main text we discussed two different ways of including mutations. In the first 
type of model, mutation rates were given by Eq. (ITS]) . This equation may be written as 



where 
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Suppose that we first look at the case where a, (3 = 1, . . . , M — 1 in the transition rate 
(]A.4j) . This gives the following contribution to the right-hand side of the Fokker-Planck 
equation: 
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The terms involving only the mutation rates are 
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The analogous expression when a = 1 , . . . , M — 1 and (3 = M is: 
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which lead to the analogous expressions to Eq. (1A.6I) : 
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From the expressions flA.6j) - flA.8l) . the term involving mutations in the Fokker-Planck 
equation can be found. For example, let us focus on the mutation rates involving A a 
and Ap with a, (3 = 1, . . . , M — 1 and a ^ (3. After some straightforward algebra, the 
terms from Eqs. (1A.6j) and (1A.7I) which involve P, but not derivatives of P are 
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using Eq. (1A.5j) . The terms from Eqs. (1A.6I) and (1A.7I) which involve derivatives of P 
are more complicated, but eventually yield 
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We may combine Eqs. (I A. 91) and flA.101) to give 
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Introducing the scaled mutation rates defined in Eq. (1771) . this expression equals 
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P + 0[^). (A.12) 



Using the rescaled time r = 2t/N 2 and letting A" — > oo, we see that we obtain the 
mutation terms displayed in Eqs. (1751) and Eqs. (179"|) . at least in the sector where 
a, [3 — 1, . . . , M — 1. In a similar way, the mutation terms where the allele Am is 
involved may be derived from Eqs. (1A.7|) and (IA.8I) . One again finds the result given by 
Eqs. (CHD and Eqs. (I79|) . 

The form of the mutation terms in the Fokker-Planck equation in the second scheme 
we investigated, defined by the transition rates ( 1531) . follows from the results obtained 
above for the first scheme. This is because if one takes u a p = u a for all /3 ^ a, then the 
transition rates of the two approaches are identical, although in the second case there 
is the restriction that Y^sLi u s < 1- We therefore find the same Fokker-Planck equation 
( 1781) . but now with Eq. ( 1791) taking the form 

M M 
A a (x) = {UaXp ~ UpX a ) = U a - ^ UfsX a , (A.13) 
Q=\ (3=1 

which is Eq. (1851) . When written out in terms of the scaled mutation rates, the condition 
on the us in the second case reads 

E^<17> (A.14) 

6=1 Z 

which, since the rescaled rates are numerically small and we have taken the limit 
A^ — > oo, is always satisfied. 

Finally, we consider the most general case where there are L islands and M alleles. 
As we remarked in the main text, this involves only minor modifications of calculations 
which have been carried out previously. This is because, since both the mutation and 
migration rates are scaled by a factor of N, any product of these rates will give a term 
of order l/N 3 , and so will not contribute to the Fokker-Planck equation. Therefore we 
may consider mutation and migration separately, and simply need to insert appropriate 
indices on the contributions already calculated. 

First let, us consider Eq. (1551) . which does not include mutation. If we begin by 
ignoring the migration terms g^ with i ^ j, then we see that the transition rates will 
be of the form given in Eqs. ( 1691) and ( 1701) . but multiplied by fi and with an index i 
included. Thus the result of a 1/N expansion is as in Eq. (1A.3j) . but with an index % 
and fi included: 

j ( M-l Q2 m-i M-l Q2 "1 
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up to terms of order 1/A^ 3 and higher. This has to be summed over all i. 

Returning to Eq. (155|) . the transition rates which involve migration can be found 
by analogy with the discussion in Section [3.3. 1[ and in particular Eq. (1601) . with a and 
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(3 indices for the allele labels. These appear as in Eqs. (lA.ip and (IA.20 . The result is: 
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Scaling the migration rate by introducing = NGy/2, adding an analogous term 
which represents migration from island i to island j, and summing over all pairs of 
islands (ij) gives 
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The third and final term is found from Eq. fl89|) ignoring the migration terms g^- with 
i 7^ j. This is as in Eq. (l76|) . but multiplied by and with an index i added to the np 
and n$. From the discussion leading to Eq. ( 1A.12I) we find that the contribution for the 
first type of mutation is 
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This has to be summed over all i. The second type of mutation gives the same result, 
but with A a {x.i) defined as in Eq. fl85|) . 

Putting Eqs. (IA.15I) . (1A.17I) and (j A. 18ft together gives equation fl90l) . a general 
Fokker-Planck equation describing a population of genes, of M possible types, 
subdivided between L different islands. 
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